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I. CONCEPT OF TEST RELIABILITY 


The term reliability coefficient was introduced by Spearman in 
19048 to denote the correlation between scores made on comparable 
forms of a test. The term has been subsequently applied to the corre- 
lation between repetitions of the identical test, and to the correlation 
between scores on the two halves of the same test. The correlation 
of halves has been computed either by correlating the score on odd 
items with that on even items, or occasionally, by correlating scores 
on the first and second halves of the test. The latter represents simply 
the correlation between different forms administered in immediate 
succession. From the correlation of halves, the reliability coefficient 
is ordinarily estimated by means of the Spearman-Brown prophecy 
formula. 

The question has been frequently raised, and justly so, as to 
whether these various coefficients should all be designated by the 
blanket term of ‘“‘reliability coefficient.” The results obtained by 
Woodyard,”’ Foran,* and others show that the reliability coefficient 
is consistently higher when computed from identical retests than when 
computed from parallel forms of a test. It has been suggested by 
Kelley,? that the correlation between identical retests constitutes an 
upper limit of reliability while that between comparable forms consti- 
tutes a lower limit. Turning to the reliability coefficient as estimated 
from the correlation of halves, one finds a different result again. This 
coefficient is ordinarily higher than that found by either of the other 





* This is one of a series of studies made possible through a grant from the Coun- 
cil for Research in the Social Sciences of Columbia University. 
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two methods. Weidemann** has advocated the use of such terms as 
“consistency coefficient” and “‘self-consistency coefficient,” to refer 
to retest correlation and odd-even correlation, respectively, in place 
of the common term “reliability coefficient.” 

The use of more differentiating and descriptive terms as contrasted 
with the current practice of labeling any measure of consistency a 
“‘reliability coefficient,” would indeed make discussions of reliability 
more intelligible. It cannot, however, settle the question of just what 
each measure means or when it should be used. 

The reliability of a test is generally taken to be an index of the 
consistency of the test as a measuring instrument. It follows then, 
that perfect reliability is not in any sense synonymous with identity 
of response on different occasions. A discrepancy in score on suc- 
cessive retests may simply mean that the test is serving its function 
as an accurate and sensitive index of actual changes in the subject. 
An analogy suggested by Skaggs" illustrates this point. One does 
not measure the reliability of a thermometer by comparing tempera- 
ture readings on different days. The thermometer may be perfectly 
reliable and still give very different readings on successive days. Such 
fluctuations in daily temperature readings would be of interest if one 
wished to determine the reliability of an estimate of daily temperature, 
from a single day’s reading, 7.e., the consistency of the temperature, 
not of the thermometer. Likewise, the discrepancies in score on differ- 
ent occasions show how variable subjects are in the function tested, 
or how susceptible the subject’s performance is to extraneous influences. 

Woodrow” has shown that the daily variations in performance 
are far in excess of what would be expected from the operation of 
chance factors alone. In the data analyzed by Woodrow, the sigmas 
of the daily averages around the general average always exceeded the 
average of the daily sigmas divided by »/N. The latter is the stand- 
ard error of the average and should correspond with the obtained sigma 
of the daily averages, if daily variations resulted from the same factors 
which produce variation within a test period. Woodrow summarizes 
his findings with the statement that: ‘‘The responses on different days 
clearly are not all of the same category; they belong to different statis- 
tical populations” (p. 249). 

It seems somewhat paradoxical to label a test unreliable simply 
because it may be a very sensitive measure of a phenomenon exhibiting 
marked daily variations. The whole argument hinges upon the fact 
that the subject does not remain constant during successive retests. 








as 
fer 
ace 


ted 
ya 
lity 
hat 


the 
1eN, 
tity 
3uc- 
tion 
ect. 
loes 
era- 
ctly 
uch 
one 
ure, 
lure, 
ffer- 
ited, 
ices. 
ance 
n of 
Mas 
| the 
and- 
gma 
ctors 
rizes 
days 
atis- 


mply 
iting 
fact 
pests. 


Influence of Practice 323 


The only technique whereby one can approximate constancy of con- 
ditions in the subject in finding test reliability is that of correlating 
scores on odd and even items. The effects of variation in the subject 
during even the short period of the test tend to be equalized by the 
temporal arrangement of odd and even items. This method seems, 
therefore, to give most nearly the reliability of the measuring instru- 
ment, free from extraneous changes. 

The reliability of a test may be described further in terms of the 
adequacy with which the test samples the total performance which 
it attempts to measure. This is essentially the definition of relia- 
bility given by Kelley.'° This concept of reliability finds support in 
the familiar fact that, other things being equal, and with certain 
limitations,* the larger the number of items comprising a test, the 
more reliable will that test be. The more items there are in a test, 
the greater are the chances that the test constitutes a representative 
sampling of the ability measured. The odd-even correlation measures 
the consistency between two random samplings of items in the test. 
If each sample of items is adequate and representative, the correlation 
between the two will be high. 

If one tries to adopt any other definition of reliability, logical 
difficulties will be met at the outset. For example, it would be absurd 
to conclude that a reliable measure of certain functions cannot be 
devised, because the functions in question exhibit marked variations 
on different occasions. It is just for the measurement of such a 
variable function that one would want a highly reliable measuring 
instrument. Likewise, viewing the problem from a slightly different 
angle, one finds that the more sensitive and reliable the measuring 
instrument, the more thoroughly will it detect fine variations from 
time to time in the function measured. 

The correlation of comparable forms of a test administered on differ- 
ent occasions reflects both the unreliability of the test and the daily 
fluctuations of the trait measured. Paulsen" suggests that the test- 
retest coefficient be corrected for ‘attenuation,’ using the odd-even 
correlation as a reliability coefficient. The resulting corrected 
coefficient may then be used as an index of the stability of the trait 
itself from day to day, for which Paulsen proposes the term ‘“‘ Coeffi- 
cient of Trait Variability.” It should be pointed out that the dis- 
tinction between daily fluctuations in performance and test reliability 





* Cf. e.g., Lanier. 2 
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does not imply that the former depends upon the individual alone, 
and the latter upon the test alone. In both cases, one is dealing with 
responses of the individual to the test situation. In the former case, 
the chief concern is with the individual’s responses to the objectively 
same stimuli on different occasions; in the latter, with the individual’s 
responses to different stimuli selected to sample the same general field 
of activity. 

A more refined technique for determining reliability has been sug- 
gested by Cureton.* This technique can be applied when three or 
more forms of a test are used. The intercorrelations of the three forms 


1, z, and J, are first computed. Then the reliability of form 1 may be 
Tuli 





found by the formula R; = The same can, of course, be done 


cy | 
for the other forms. This formula will be recognized as the square of 
Spearman’s r,, formula, and it is, in fact, a special case of that formula, 
in which the specific factors can be regarded entirely as errors of meas- 
urement. The square root of the value obtained would give the 
correlation of one form with what is common to all three, which corre- 
sponds to the general definition of the index of reliability. Dunlap‘ 
subsequently pointed out that, before this formula can be applied, it 
must first be shown that the test forms satisfy the tetrad criterion, 
since the use of this formula assumes that there is only one common 
factor throughout the test forms. 

Cureton criticized the use of the split-half technique in estimating 
reliability, on the grounds that it gives only the ‘‘test error” free from 
the ‘‘response error” (p. 18). Disregarding the possible implications 
suggested by these terms, namely that one can distinguish between the 
reliability of the test and of the subject per se, we can still take issue 
with this statement. If the split-half technique yields only one kind 
of reliability and the retest technique a composite of two, one should 
certainly select the former as easier to interpret. Dunlap,‘ in discus- 
sing the use of Cureton’s technique, suggests that it be used on the 
different parts of a test administered at a single sitting, and again 
points out that an individual’s ‘‘true” score means his ability at the 
time of testing, including all the mental and physical influences, that 
affect his performance at the time. 

In addition to the theoretical considerations discussed in the 
preceding pages, there are various methodological reasons for prefer- 
ring the odd-even correlation to the correlation of parallel forms. 
The difficulties encountered when the retest method is used are too 
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well known to require elaboration at present. Some of the chief 
obstacles are: (1) general adaptation or specific practice effect from first 
to second testing; (2) the correlation of errors* which results when the 
same or very similar test forms are used; (3) the possibility that the 
different forms are not strictly comparable, but may be measuring 
different functions; (4) finally, the degree to which conditions should 
be kept constant on the successive retests is a point regarding which 
one is usually quite vague and inconsistent. All investigators would 
probably agree that the subject should, on both occasions, be in good 
health, free from any illness or physical handicap and from undue 
emotional strain. This, however, is not only difficult to ascertain for 
each individual when a large number of subjects is used, but it is a 
very vague criterion indeed. Just how great a variation in general 
health and well-being should one permit? Just how strong must an 
emotional disturbance be before it is considered ‘‘unduly strong’’? 


II. RELATION OF RELIABILITY TO PRACTICE 


Symonds” in 1928 reported a list of twenty-five factors which may 
affect the size of a “‘reliability coefficient,” using the term in a general 
way to cover all of the measures commonly classed under reliability. 
One of the factors mentioned is practice. After a brief summary 
of some experimental data obtained from published studies, Symonds 
concludes that: ‘‘At present the available evidence leads us to believe 
that position of a function on the curve of learning has little relation 
to the reliability or consistency of that function” (p. 87). 

The experimental data on the effect of practice upon reliability 
are surprisingly meagre. In relatively few experiments on practice 
has there been any attempt to measure the reliability of the tests. 
In some studies, such as those of Chapman? and Race," general esti- 
mates of the reliability of the tests are essayed, but with no differentia- 
tion of earlier and later trials. What few studies do report reliability 
at different stages of practice yield inconsistent conclusions. Gates’ 
put twenty-three college women through twenty-two trials of each 
of four simple tests and twenty-nine trials of a fifth test. Reliability 
coefficients were computed by correlating the median scores of suc- 
cessive sets of three or four trials. These correlations showed a 
fairly consistent trend to rise with practice. Slocombe” gave eight 
parallel forms of an analogies test to seventy-five London school- 





* Cf. Kelley, Statistical Method.® 
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boys, the forms being administered one week apart. Correlations 
between successively administered forms showed no tendency to rise 
from the second to the eighth week. Gundlach® gave thirty-nine 
college students twenty-five trials of each of four tests. Correlation 
coefficients were computed for each pair of adjacent trials. Again 
no consistent trend was apparent. 

Kincaid" reports initial and final reliability for practice in Braille 
writing and dart throwing. In the former, thirty-one subjects were 
given forty-seven trials; The initial and final reliabilities were com- 
puted by correlating trials one and two and trials forty-six and forty- 
seven, respectively; the coefficient rose from .81 to .91. In the latter 
task, twenty-eight subjects were given forty trials and reliabilities 
were again computed by correlating the first two and the last two 
trials; this time the coefficient dropped from .84 to .79. Kincaid 
also reports similar reliability coefficients which she computed from 
the data of eight published studies. These coefficients, together 
with tasks used, number of subjects, and trials correlated in obtaining 
initial and final reliability, are reproduced in Table I. Kincaid 
computed Spearman rank-difference correlations for every study, 
with the exception of McCall’s, for which she employed the Pearson- 
Bravais product-moment formula. It will be seen that, in thirteen 
of the twenty-two experiments reported in Table I, there is a rise 
in reliability coefficient from initial to final stages of practice; in one 
case there is no change; and in the remaining eight, a drop. 

It will be noted that in no case was the measure computed a relia- 
bility coefficient as defined in Section I above. Furthermore, the 
specific details of technique used in arriving at the estimate of relia- 
bility, such as which trials or combinations of trials were to be corre- 
lated, differed from one study to another. It is not surprising, 
therefore, to find that the results are inconsistent and do not reveal 
any special trend. In a practice experiment, the use of the retest 
correlation seems especially ill-suited to the purpose of estimating 
reliability, since the scores must change on successive trials—otherwise 
one would hardly call it practice—and training does not affect all 
subjects equally. Another complicating and uncontrolled feature is 
thus added to the picture in such a set-up. 

Further explanation of the lack of corroboration among studies 
reporting reliabilities at different stages of practice may be found 
in a number of other discrepancies among such studies. In some cases, 
the same test form is given repeatedly; in others, parallel forms are 
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TaBLeE I.—RE.LIABILITY COEFFICIENTS REPORTED BY KINCAID 
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Author* and test 
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Thorndike, 1908*°................. 
Multiplication............... 


. . .]28) (1) (2); (8) (9) 
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* The studies are arranged in chronological order. 


t The trials correlated for initial and final reliability are separated by a semi- 
colon. When two or more trials are given within the parentheses, it indicates that 


their average was used in computing the correlation. 
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used. The variety of forms employed, even though apparently very 
similar, may introduce marked variations in reliability, as will be shown 
below. Secondly, the extent of the practice effect is a significant 
factor. Whether the subjects are given eight trials one week apart, 
or twenty trials in immediate succession will certainly make a differ- 
ence. Thirdly, the type of test used may alter the results. Some 
tests change radically in their nature in the course of practice, a cir- 
cumstance which might either raise or lower the reliability spuriously. 
Fourthly, the distribution of practice, the spacing of the trials, may also 
affect the results. 


III. THE PRESENT EXPERIMENT* 


The object of the present investigation was to determine the 
influence of practice upon test reliability, the latter being defined 
as the consistency of the test as a measuring instrument. The tests 
included: . 

1. A-cancellation test, consisting of twenty-seven rows of capital 
letters with ten A’s interspersed in random order in each row. Five 
different forms of this test were constructed, differing only in the 
arrangement of letters. 

2. Hidden Words test containing seventy four-letter English words 
spelled backwards and scattered in pied type. Ten forms of this test 
were constructed. 

3. Pyle Symbol-digit Substitution test—Subjects were directed to 
begin on the left half of the blank on odd trials and on the right half 
on even trials. (These are referred to as L and R forms, respectively, 
in Table II.) 

4. Nonsense Syllable Vocabulary test consisting of a key sheet with 
fifty pairs of three-letter nonsense syllables, and five alternate forms of 
test sheets containing only the first member of each pair of syllables, 
the second to be filled in by the subject by reference to the key. 

All of the practice for one test was administered during a single 
sitting. The tests were given by the time-limit method, four minutes 
being allowed for each trial of Hidden Words, two minutes for each 
trial of the other tests. The number of trials given was fifteen for 
Hidden Words and twenty for each of the other tests. A one minute 
rest period intervened between trials in each case. At the end of 





* This experiment was conducted as part of a more extensive project on the 
effects of practice. For fuller description of tests and details of procedure, the 
reader is referred to the report of that project.! 
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each quarter of the time allowed for a trial, subjects were signaled by 
the experimenter to “mark.” It was thus possible to compute four 
separate scores on each trial, which were subsequently combined into 
two measures, by adding the two odd quarters and the two even quar- 
ters. This grouping of quarters was found preferable to combining 
first with fourth and second with third quarters, since both practice 
and fatigue were operative within each trial, which made the two 
middle quarters higher than the end quarters. The two measures 
thus obtained for each trial were correlated, using the product-moment 
formula, and the reliability coefficient estimated therefrom by the 
Spearman-Brown formula. 

The subjects of this experiment were college students of both 











sexes. ‘The number of subjects who participated in the experiment in 
each of the tests was as follows: 
SN oo. ka ce be ba hh ou S004 0 hc0 £60 a Cad es Cade cok Er 
I i ee PS i ed thats oo ae ee 
NS i nic ik i lets elias wild a binck Oh oe dad Coben ad bx 
Ea, eS Se ee ee 
TaB Le I].—Tue Errect or Practice upon Test RELIABILITY 
Cancellation | Hidden Words Symbol-digit Vocabulary 
Trial 
Form r Form r Form r Form r 
1 3 . 7662 3 .§423 L .8422 3 .7812 
2 4 .8348 4 6979 R . 8348 4 .6137 
3 5 .8526 5 .7750 L . 8526 5: . 7854 
4 1 .8185 6 .5942 R .8185 1 .6107 
5 2 . 7862 7 .7615 L . 7862 2 .6950 
6 3 .8462 8 . 7527 R . 8462 3 .7182 
7 4 .8812 9 8486 L .8812 4 .7118 
8 5 . 8409 10 .8254 R .8409 5 . 7893 
9 1 . 8383 1 . 8336 L .8383 1 .6467 
10 2 .8825 2 . 8564 R . 8825 2 .7195 
ll 3 .9110 3 .8441 L .9110 3 . 7554 
12 4 .8573 4 .8429 R .8573 4 .6417 
13 5 .9105 5 .9101 L .9105 5 . 7375 
14 1 . 8565 6 .8402 R .8565 1 .5908 
15 2 . 8839 7 .8461 L . 8839 2 .7940 
16 3 ns bs. oe Wek R . 8883 3 . 8483 
17 4 ED? oc oe L .8140 4 .7971 
18 5 [as oer R .8870 5 . 7884 
19 1 EE SS ek Bowe L .8548 1 .7594 
20 2 .8981 R .8498 2 . 7483 
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In Table II the reliability coefficients are presented for each trial 
of the four tests, as well as the particular form of the test used in 
each trial.* A cursory examination of this table reveals a definite 
trend towards a rise in reliability with practice. This trend is clearly 
indicated in Cancellation and Hidden Words, whereas in Symbol-digit 
and Vocabulary there seems to be some doubt. If we confine our 
comparisons to initial and final trials, Symbol-digit shows only a 
negligible rise and Vocabulary a negligible drop. 

In order to obtain a more stable picture of the general trend, the 
average reliability coefficients of the first five and last five trials of 
each test were computed. In Cancellation and Vocabulary in which 
five forms were used in rotation, these averages included the first and 
last trials, respectively, of each form. The average correlations were 
arrived at by first transmuting each correlation coefficient into its 
corresponding value in Fisher’s z-function (5, pp. 175-184), averaging 
the z-values, and then transmuting this average back into an r-value. 
Since successive correlation units do not represent equal degrees of 
difference in relationship, a direct average of various correlations 
would not give a true picture of the trend of relationship. The 
following average values were obtained by the method described 
above: 


TaBLe IJI.—AveraGce RewiaBiLity CoEFFICIENTS 








Average of: Cancellation | Hidden Words | Symbol-digit | Vocabulary 
First five trials......... .8076 .6858 . 8306 . 7064 
Last five trials.......... .8717 .8591 .8643 .7531 

















It will be seen that there is a clear-cut rise from the first to the last 
average in each test. The general trend towards a rise in reliability 
with practice seems, then, to be fairly well established. The next 
problem now is to discover what factors have brought about the 
discrepancies found in specific instances, such as the difference in 
amount of rise in the different tests, or the fluctuations from trial to 
trial. 

The first factor to be considered is amount of improvement. Mere 
repetition of a test does not raise its reliability. There must be 4 





* The fact that each series was begun with form three may appear rather whim- 
sical. This was necessary since forms one and two had been used as preliminary 
tests in connection with a related study (ef.’). 
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genuine practice effect, shown by a rise in mean and standard devia- 
tion, before a rise in reliability will occur. In Table IV will be found 
the means and standard deviations of the first and last trials of each 
test. 


TaBLe [V.—MEans AND STANDARD DEVIATIONS 











Trial Cancellation | Hidden Words | Symbol-digit | Vocabulary 
Mean 
I ns dn ene einai 101.90 18.44 59.48 26 .63 
| ODI RABE E:: 139.98 42.04 115.67 37 .57 
SD 
WM OK AS 11.91 5.11 12.34 3.89 
Deh s ssa oiiadese 16.52 10.88 19.91 4.88 

















A direct comparison of amount of improvement from test to test 
is impossible since the units differ for each test. A comparison of 
percentage improvement or any other ratio measure would be mis- 
leading since the test zero does not represent absolute zero ability 
in the trait measured. Certain rough comparisons can, however, be 
made. The test showing the greatest rise in reliability is Hidden 
Words; that showing the least is Vocabulary. An examination of 
Table IV will show that, of these two tests, Hidden Words has the 
lower initial average and the higher final average. Whatever the 
difference in units, such a difference should have operated to make 
the final average of Hidden Words the lower of the two, since its initial 
average is lower. It is apparent, then, that the improvement in 
Hidden Words was much greater than that in Vocabulary, a fact which 
corresponds with the respective trends in the reliability coefficients 
of the two tests. 

The results obtained with the Symbol-digit test suggest a second 
factor which may affect the change in reliability from initial to final 
trial. This test, although showing a marked practice effect, exhibits 
the smallest rise in reliability of the four tests, the average reliability 
of the initial five and final five trials being .8306 and .8643, respec- 
tively. Further analysis shows that this apparent inconsistency may 
be attributed very largely to the difference in initial reliability of the 
various tests. The higher the original reliability, the less the chances 
that any factor introduced into the test situation will raise it further. 
It is very difficult for a test as short as those used in the present 
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experiment to have a reliability much over .80 under any conditions. 
Furthermore, it will be recalled that successive correlation units do not 
represent equal increments in degree of relationship. As the size of a 
correlation coefficient increases, each unit becomes successively larger. 
Hence a rise in reliability coefficient from .83 to .86 is a much larger 
increase in actual degree of relationship, than, let us say, a rise from 
.63 to .66. 

A third factor to be taken into account is the possible difference 
in reliability of parallel forms of a test. The importance of this factor 
in producing what might otherwise seem haphazard fluctuations from 
trial to trial is usually underestimated. In Table II (p. 329), the 
form used in each trial has been indicated, together with the relia- 
bility coefficient of that trial. In order to facilitate analysis, the data 
for Cancellation and Vocabulary, in which each form was repeated 
four times, have been regrouped in Table V. 


TaBLeE V.—INCREASE IN RELIABILITY FROM First TO Last ADMINISTRATION OF 
Eacu Form 





Form Cancellation . Vocabulary 





.8185 | .8383 8565" .8548 | .6107 | .6467 | .6908 | .7594 
.7862 | .8825 | .8839 | .8981 | .6950 | .7195 | .7940% .7483 
.7662 | .8462 | .9110°] .8839 | .7812 | .7182 | .7554 | .8483 
.8348 | .8812 8578} .8140 | .6187 | .7118 | .6417 | .7971 
.8526 | .8409 | .9105"] .8870 | .7854 | .7893 | .7857 | .7884 





























The results now appear surprisingly consistent. The last trial 
with each form, with only one exception (viz., Cancellation, form 4), 
has a higher reliability than the first trial with the same form. 
Furthermore, in the large majority of cases, each trial with a given 
form has a higher reliability than the trial immediately preceding it, 
with the same form. There are only five exceptions to this trend in 
each of the two tests, these exceptions having been indicated in 
italics in Table V. 

A similar analysis of the data cannot be made for Hidden Words, 
since ten different forms of this test were used in only fifteen trials, so 
that five of the forms were used only once, and the remaining five, 
twice. The comparison of initial and final reliability in those forms 
which were repeated twice does, however, corroborate fully the con- 
clusions reached with Cancellation and Vocabulary, the final reliabili- 
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The data for 








Form 3 4 5 6 7 
Initial reliability....................| .56423 | .6779 | .7750 | .5942 | .7615 
Final reliability.....................] .8441 | .8429 |} .9101 | .8402 | .8461 




















It should be pointed out that great care was exercised in the 
construction of all of these tests, in order to insure similarity of the 
various forms. The difficulty of all forms was found to be very 
nearly equal, on the basis of preliminary testing of small groups. 
The similarity of the test forms used in the present experiment is 
probably greater than is commonly found in supposedly parallel forms. 
In fact, all of the forms of each test used in the present experiment 
consisted of identical material, but in a different arrangement. The 
material was also kept as homogeneous as possible within each form. 
It seems, therefore, that the influence of the particular test form 
upon reliability has been minimized rather than exaggerated in the 
present findings. : 

Finally, as a fourth factor, one should include the somewhat less 
tangible influence of a change in the nature of the test with practice. 
This factor may operate in very different ways, depending upon the 
particular test. Some tests may change in nature in the course 
of practice, and yet the change may leave the reliability quite unaf- 
fected. Some might conceivably change in such a way as to make the 
reliability higher. In the case of many common psychological tests, 
however, the change is probably from a more reliable to a less reliable 
type of test. This is illustrated by two of the tests in the present 
experiment, namely Symbol-digit and Vocabulary. During the 
initial trials, performance on these tests depended largely upon speed 
of perception and learning, ability to memorize a code, and other 
Similar variables in which individual differences are fairly large. At 
the end of the practice period, however, most of the subjects had 
probably reached the.stage at which speed of writing played the chief 
role in performance. This is a motor ability in which individual 
differences are relatively small, and in which, consequently, momen- 
tary variations loom relatively large. The result will be a lowering 


of reliability from what it would otherwise be at such a practice 
level. 
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In conclusion, the findings of the present experiment indicate that 
test reliability tends to increase with practice in the test, when the 
test is administered by the time limit method and reliability is meas- 
ured by the odd-even technique. If discrepancies are found in suc- 
cessive reliability coefficients, they probably result from the presence 
of extraneous disturbing factors, four of which were illustrated in the 
present experiment. 


Why should practice raise the reliability coefficient? There is 


no intrinsic reason why practice qua se should produce this effect. 
Practice, however, has the same effect upon a time-limit test as that 


produced by lengthening the test. Although objectively, the number 
of items in the test remains constant throughout the practice period, 
the subject responds to an increasing number of items in successive 
trials. Hence the effectual length of the test may be said to increase 
with practice. The result is that, as practice proceeds, the test 
becomes a more adequate sampling of the particular aspect of behavior 
measured, which is just what the rise in reliability coefficient depicts. 
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PERMANENCE OF VOCATIONAL INTERESTS 


EDWARD K. STRONG, JR. 
Stanford University 


Presumably when a boy says he would like to be a lawyer, or some- 
thing else, he means that such an eccupation, as he conceives it, would 
afford him an opportunity to do something he wants to do. Thus a 
small boy’s desire to be a motorcycle traffic officer may be the result of 
an unconscious argument of the following nature: 

1. My father slows down when he sees a traffic officer. 

2. My father must be afraid of a traffic officer. 

3. If I were a traffic officer, my father would be afraid of me. 

4. That would be fun. 

There would seem to be at least four different factors involved in 
an occupational choice: (a) Knowledge of occupations, (b) interests 
or desires, (c) appreciation of these interests, and (d) understanding 
of how one’s desires can be secured by entry into this or that occupa- 
tion. Knowledge of occupations increases throughout life, interests 
change somewhat, appreciation of their significance changes much 
more, and necessarily understanding of how interests and occupations 
may be best fitted into one another changes very greatly. Under 
these circumstances it is not surprising that young people should 
flounder around and find the task of selecting an occupation a most 
puzzling and difficult one. 

Attempts to measure the permanency of vocational interests have 
been very largely confined to some such procedure as this: Subjects 
are requested to report their occupational choice on two separate 
occasions, the per cent of equivalent choices being considered a 
measure of permanency. In some studies, equivalence has been 
limited to specific occupations; in others, occupations have been 
classified into fifteen to thirty groups and only changes from one group 
to another have been deemed not equivalent. Fryer! summarizes the 
literature as follows: 


There is, in general, it would seem, about a fifty to fifty chance of predicting 
the development of an interest trend (second choice within the same occupational 
group as the first choice) over a considerable period of time—fifty for and fifty 
against. This prediction is, of course, far above a guessing basis when it is recalled 
that there are as many as thirty-three possible trends which were used in some of 





1Fryer, Douglas: The Measurement of Interests, 1931, p. 150-151. 
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the studies. The presence of interest trends is an important factor to consider in 
mental development, but it is not 9 factor that can be predicted with any high 
degree of accuracy. 


If our analysis of the situation is anywhere near correct, it means 
that the above studies demonstrate the degree of permanence of “ first 
choice”’ but they do not determine the permanence of interests them- 
selves. Changes in first choice may be attributed not only to changes 
in interests but also to better understanding of occupations, or of one’s 
own interests, or of how the latter may be best expressed through the 
former. 

The evidence is clear that interests are very stable from twenty-five 
to fifty-five years of age. The evidence is also clear that there are 
real changes from fifteen to thirty years of age. To a large degree 
this is due to the development during later adolescence of interests 
which are seemingly largely sublimations of the parental instinct. 
Although there are these changes, nevertheless interest patterns are 
really surprisingly stable from fifteen years of age on. 

If the Vocational Interest Test merely reported one or more scores, 
indicative of the strength of certain interests, as does the Bernreuter 
Inventory in the case of personality traits, it would render the single 
service of aiding the individual to understand better what his interests 
are and their relative strength. But the interest test goes further and 
interprets a wide sampling of specific interests in terms of occupational 
choices. It accomplishes this on the assumption that if a man has the 
same general constellation of interests that are found among men 
successfully engaged in an occupation, then, as far as interests go, that 
occupation is one to enter. The interest test eliminates the necessity 
(1) of knowing what is involved in the various occupations, (2) of 
cataloging one’s interests and estimating their relative values, and (3) 
of determining which occupations will provide maximum opportunity 
of doing what one wants to do and minimum necessity of doing what 
one does not want to do. 

The permanency of occupational interests as reported here is based 
on how close an approximation to the interests of various occupational 
criterion groups is found in the interests of college seniors at the time 
they were seniors and five years later. Although the data do not 
express the degree of interest in any one direction they are nevertheless 


* Strong, E. K. Jr.: Change of Interests with Age, 1931. 
“Interest Maturity,” Personnel Journal, Vol. XII, No. 2, August, 
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measures of the permanency of interests themselves for they record 
the similarity of a man’s interests regarding 420 items to certain 
defined constellations of interests. 


FIVE YEAR FOLLOW-UP OF COLLEGE SENIORS 


Two hundred twenty-three of the 1927 seniors at Stanford 
University filled out the Vocational Interest Test in 1927 and again 
five years later. On four different occasions, namely, March 1927, 
January 1928, January 1929, and March 1932, they reported their 
vocational choice and what they were doing to earn a living. 

Table I reports the coefficients obtained by correlating the occu- 
pational interest scores from the two sets of blanks. Permanence of 
interests for five years when thus measured varies from .59 to .84 
with an average of .75 on 21 scales. If these correlations are corrected 
for attenuation the average coefficient is raised to .84. Such coeffi- 
cients establish the fact that there is a surprising constancy of interests 
among college graduates in the five-year period from twenty-two to 
twenty-seven years of age. 

This correlation coefficient of .84 means that approximately the 
same rank order is maintained for interest scores in both 1927 and 
1932. In other words, those who had interests most similar to 
engineers, or lawyers or ministers on the first occasion were the ones 
who had scores most similar to these criterion groups on the second 
occasion and vice versa. But such correlations do not disclose whether 
the interest scores in 1932 were actually higher or lower than the 
scores in 1927. Let us consider then the permanency of occupational 
interests from the standpoint of amount of scores rather than with 
respect to relative relationship of scores. 

Since occupational interest scores are measures of approximation 
to the interests of men whose average age is in the neighborhood of 
forty years and since interests change somewhat from age twenty to 
age forty, it is to be expected that college seniors would score somewhat 
lower on these scales at that time than five years later. This is shown 
to be true in Table II where the data express the average number of 
changes in ratings for this period. (Change from C rating to B— is 
counted as one step, from C to B as two steps, from C to B+ as three 
steps, etc.) According to these figures the average college man will 
secure .41 of a step higher ratings in his occupational scales at age 
twenty-seven than at age twenty-two and at the same time he will 
secure .16 of a step lower ratings on these scales, or a net change of 
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TaBLE I.—PERMANENCE OF INTERESTS. CoLLEGE SENIORS TESTED IN 1927 AND 
AGAIN IN 1932. (N = 223) 
Correlation 
coefficients Reliability | Correla- 
Grow Occupational esale of scale | tion with 
ve sis Corrected | (odd-even | interest 
Raw for at- | technique); maturity 
tenuation 
I Mathematician........... .75 .83 .90 — .07 
SENT o's 2%-s bo sa cas 81 86 .94 — .26 
pe ee 84 .90 .94 — .36 
NS «cic asad what ioe eaie .84 .94 .89 —.18 
Ee .78 .88 .88 —.12 
Psychologist............. .79 .88 .89 21 
a NR a ed .74 .88 .84 — .05 
Rs 6 Sok we ees 6 064% .81 .93 .88 — .46 
Gg hcl a c's ba<etunoe 7177) . 922 — .07 
TRO: Aids cos sakaen cous .77 . 86 .90 . 26 
SS 6c a pc a we ba .79 .84 .93 .15 
I «oc adoweceedee .80 .88 .90 ll 
Ilb_ | Life insurance salesman... .80 .88 .90 .20 
Real estate salesman..... .76 .84 .90 —.10 
SEO ES oie Ds kee > Made .64 .70 .92. .74 
RN cite n «kc ccawen .66 .73 .90 .67 
IIIb | Y.M.C.A. secretary. ..... .67 .76 .88 .68 
Y.M.C.A. physical director 71 .80 .89 .38 
Personnel manager....... .62 .80 .78 55 
City school superintendent}  .70 81 . 86 .76 
IV_ | Office worker............ .74 .83 .89 17 
Vacuum cleaner salesman. .73 .16 
V Certified public accountant} .59! 742 .37 
Interest maturity........ .69 .88 .78 
Average of twenty-one 
Wey Spars, = 2 ae .75 84 .89 
' Original scale. 
* Revised scale. 
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| Me .25 of a step in the direction of a higher rating. This means that if a 



































am i twenty-two-year-old man received a B rating on all sixteen scales he ; 
| b would on the average have twelve B ratings and four B+ ratings five b 
years later, or more accurately still, he might have six B+ ratings, t 
i. eight B ratings, and two B— ratings. ‘ 
i ie As seventy per cent (see Table III) of all the ratings obtained by , 
a. college seniors are C the chance of raising the ratings is greater than ‘ 
. that of lowering them. The maximum possible changes are indicated ‘1 
" " in the third and fourth columns of Table II. In terms of these i 
<a possible changes it appears that the total changes in ratings which . 
a! actually occur amount to .12 upwards and .23 downwards of what t 
1h might happen. I 
ry j e TaBLe IJ].—Averace CHANGE IN INTEREST RATINGS AFTER Five YEARS. BAsEpD 
4 4 ON THE Recorps oF Two HuNnDRED TWENTY-THREE COLLEGE SENIORS Ss 
i¥ Fl . 
it Pe Maximum change Percent actual ( 
+ ; Average change | possible in terms : 
ee ; ; ; change is of 
ed a in rating after | of the five ratings: inetietenh senile C 
YM Occupational scales ate yenes 4, 8%, 8, = change t 
eT Tai and C Vv 
rl D 
Up- Down- Up- Down- Up- Down- a 
wards | wards | wards | wards | wards | wards h 
Personnel manager... . .68 21 2.91 1.09 .23 .19 ‘ 
Ns vvn st tke-coags .78 .10 3.37 .63 .23 .16 I 
Journalist............ .42 26 | 2.68 | 1.32 .16 .20 b 
are .54 13 3.07 .93 18 14 0 
SS eae .39 27 2.90 1.10 .13 .25 v 
0 ee re .45 21 2.63 1.37 .17 15 . 
Purchasing agent...... .43 .19 2.79 1.21 .15 16 
Certified public ac- = 
Rs. 6x!a Kes one .57 .05 3.77 .23 15 .22 Ig 
Real estate salesman... .33 .25 3.10 .90 ll .28 W 
| Mathematician........ 47 .09 3.81 .19 12 47 n 
a Life insurance salesman .28 18 3.28 .72 .09 25 
A ! City school superin- + 
|) | ES eee .29 18 | 8.72 .28 .08 .46 
tf RS 6 sn 8:66 4450 9-0 .32 .09 3.46 54 .09 .16 y 
| Advertising man....... .20 12 3.59 41 .06 .29 
1s Y.M.C.A. secretary....| 19 12 3.72 .28 .05 .43 ra 
ae Minister.............. .22 .09 | 3.81 .19 .06 47 
Bhs; b 
if as ee ae 41 16 3.29 71 12 .23 st 
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The raw correlation coefficients expressing permanence of interests 
over a five year period in Table I are lower than they should be 
because the scales do not have perfect reliability (see third column of 
the table). Correction for this attenuating factor is given in the 
second column of the table. The raw coefficients are affected also 
by the fact that the various scales are varyingly influenced by matu- 
rity. The interests of engineers, for example, correlate —.36 with 
interest maturity whereas the interests of ministers correlate .74 with 
interest maturity. In other words, the interests of the former are 
more in harmony with the interests of fifteen-year old boys and 
the latter are more in harmony with the interests of adult men. 
It is accordingly much easier for a twenty-two year old man to 
obtain a high rating on the engineering scale than on the minister 
scale. 

There is a rank order correlation of —.80 between interest maturity 
(column 4 of Table I) and permanence of interest as expressed in 
column 1 of Table I. This situation leads to the hypothesis that if 
the varying effects of interest maturity could be deducted from the 
various occupational scores there would result correlations of per- 
manence that would approximate each other for all the occupational 
scales; furthermore that many such correlations would be considerably 
higher than those given in column 1 of Table I. . 

Scores on the Vocational Interest Test are ordinarily converted 
into ratings.2 The critical scores separating the five ratings lie 
between —1Q and —314Q of the criterion groups. Actual scores 
occur over a much wider range. For example, the highest score of 
which we have record on the engineer scale is at 3.9Q and the lowest 
at —8.6Q. This represents a total range of five times that between 
—1Q and —344Q. The reason for ratings being assigned as they are 
is that any score above —1Q represents sufficient engineer interest to 
warrant the conclusion that the individual has the interests of engi- 
neers and any score below —314Q indicates he does not have such 





1 The interest maturity scale is based on the differences in interests of fifteen- 
year old boys and fifty-five-year old men. See “Interest Maturity,” op. cit. 

? An A rating is assigned to scores above the —1Q of the criterion group, a B+ 
rating to scores between —1Q and —2Q, a B rating to scores between —2Q and 
—3Q, a B— rating to scores between —3Q and —34Q, and a C rating to all scores 
below —34%Q. This is true for all but a few scales where deviations from this 


standard are made because of unusual distributions of scores from the criterion 
group. 
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interests.1 Moreover, there seems to be practically no correlation 
between degree of success and scores within the A rating range, 
although there is a position correlation between degree of success and 
interest score over the entire range. 

The interest test differs from most tests in that all the high scores 
are considered equivalent and all the low scores are similarly considered 
equivalent, whereas in other tests variations in scores throughout the 
entire range are significant. It is particularly important in this 
connection to note how changes of interest over a five year period 
affect the middle range of scores. ‘When a correlation of .84 is obtained 
there may be little variation in rank order at the extremes of the two 
distributions but noticeable variation in the middle of such distribu- 
tions. If that is the case with our occupational interest scores, then a 
correlation for permanence of .84 may be accompanied by many 
pronounced changes in ratings (7.e., scores in the middle range) within 
five years time. Is this the case? 


Taste II].—Disrrmvtion oF Sixteen OccupaTIONAL INTEREST RaTINGs IN 
1927 anp 1932. (N = 223). Expressep in PERCENTAGES 











Occupational | Ratings Distribution of corresponding 1932 ratings 

interest obtained 

ratings in 1927 C Bua B B+ A 

A 4.2 .03 .06 0.3 1.2 2.7 

B+ 8.0 5 X 1.9 2.6 2.3 

B 11.6 2.2 1.6 3.7 2.7 1.3 

B- 9.2 2.8 2.1 2.8 1.2 0.3 

C 70.0 52.2 6.0 6.2 2.0 0.6 

DNS sia -0-0/0ces Pon 57.7 10.5 14.9 9.7 7.2 























Table III gives the distribution of sixteen occupational interest 
ratings for two hundred twenty-three men in 1927 (column one) and 
the corresponding distribution of these ratings in 1932. There were 
4.2 per cent A ratings in 1927 and 2.7 per cent A ratings and 1.2 per 
cent B+ ratings in 1932. Evidently if a man had an A rating in 1927 
there was very little chance of his receiving anything but an A or B+ 





1 In the light of recent developments, it might possibly be better to extend the 
B— rating to —4Q, particularly when the test is given to young men under twenty 
years of age as we know now that these low scores will become somewhat higher, 
on the average, in the next ten years. 
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TasBLE I1V.—Tue Cuance Tuat a CoLiece Senior WiLL REcEIVE THE SAME OR 
ANOTHER INTEREST Ratine Frve Years LATER 











Percentage | Percentage of those receiving a given rating in 1927 
Occupational | originally who received the same or another rating in 1932 
interest test | receiving 
— oe Cc B- B B+ A 

A 4.2 0.7 1.3 6.4 28.7 62.8 

B+ 8.0 6.8 8.5 23.2 32.2 29.4 

B 11.6 19.0 14.0 32.2 23.6 11.2 

B- 9.2 30.2 22.4 30.7 13.2 3.9 

C 70.0 77.7 9.0 9.2 3.0 0.9 























rating five years later. On the other hand, there were 70.0 per cent 
C ratings in 1927 and in five years these had changed so that there 
were only 52.2 per cent C ratings, with 6.0 per cent B— ratings, 6.2 
per cent B ratings and 2.0 per cent B+ ratings. If a man had a C 
rating at twenty-two years of age there is one chance in thirty-five 
that his rating might be raised to B+ and one chance in one hundred 
seventeen that it might be raised to A when he is twenty-seven years 
old. Expressed in another way, 63.3 per cent of the ratings will be 
identical each time and 84.6 per cent of the ratings will agree within 
one rating of each other. 

Table IV presents the same data but in another form, expressing 
the percentage who receive a given rating in 1932 in terms of the 
number receiving a given rating in 1927. Thus if a man was rated A 
in 1927 there is a sixty-three per cent chance he will receive the same 
rating in 1932; if he received a C rating in 1927 there is a seventy- 
eight per cent chance he will obtain the same rating in 1932 and only a 
0.9 per cent chance that he will receive an A rating. 

This table expresses extremely well what is meant by the five 
ratings. Primarily, the interest test answers the single question, Does 
the man have the interests of the occupation in question? The 
rating of A means “ Yes, he does have such interests’’; the rating of C 
means ‘‘No” and the ratings of B—, B, and B+ means he probably 
has with little certainty attaching to B— and considerable certainty 
in the case of B+. After five years a retest actually shows that an A 
rating will be a B rating or higher in ninety-eight per cent of cases 


and that a C rating will be a B rating or lower in ninety-six per cent of 
cases. 
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CONCLUSION 


When the interests of college seniors are expressed in terms of their 
approximation to the interests of adult men in various occupations it 
appears that the rank order of such interests will correlate .84 with the 
rank order of their interests five years later. At the same time, the 
scores obtained at twenty-seven years of age will average .25 of a 
rating higher than at twenty-two years of age. This change is due 
largely if not entirely to the fact that interests change with age and 
consequently the interests of twenty-seven year old men will approxi- 
mate more closely the interests of forty year old men than will the 
interests of twenty-two year old men. Aside from this effect of 
increasing maturity, there is remarkable permanence of interests with 
the age range of twenty-two to twenty-seven years of age whether 
measured in terms of rank order or in terms of actual size of score. 


Moreover, increasing maturity has actually a small effect, for eighty- 


five per cent of all ratings will agree within one rating of each other on 
both occasions. 

The prevailing view that interests are far from permanent has been 
fostered by studies in which vocational choices on two different 
occasions have been contrasted. But such choices represent not 
only interests but also knowledge of the occupation and understanding 
of the possibilities of one’s interests being best secured through the 
medium of the occupation. Our results suggest that the relatively 
low permanence of “first choices’ is occasioned by changes in knowl- 
edge of occupations and in interpretation of the significance of that 
knowledge rather than in changes of the entire constellation of interests. 

Prognostication of future behavior can not safely be based upon 
the presence or absence of any single interest but it does appear that 
to a considerable degree at least it can be based upon the entire 
constellation of interests. 


~~ . f= > > 





THE METHOD OF INTERNAL CONSISTENCY! 
FOR SELECTING TEST ITEMS 


JOSEPH ZUBIN 


Department of Psychology, New York State Psychiatric 
Institute and Hospital 


In the validation of psychoneurotic inventories and other tests 
of the questionnaire type it is often impossible to obtain independent 
external criteria for item analysis. In such cases it is customary to 
devise a scoring key based upon logical considerations and to give a 
credit of one unit for each item answered “‘correctly” or in accordance 
with the logical key. After computing the total score of each indi- 
vidual, the items are validated against the total score as a criterion. 
If no association exists between the ‘‘correct” response and the 
criterion, the item is discarded since it adds nothing to the measuring 
quality of the test. If the item is found to be associated with the 
criterion, the degree of the relationship is investigated.? In order to 
determine the degree of association some assumption must be made 
with regard to the distribution of the responses to the item, and if no 
such assumption is warranted, the degree of relationship cannot 
be established. 

There are three techniques for carrying out this analysis—the 
bi-serial r, critical ratio, and association methods. Each of these 
methods of analysis is serviceable in determining the value of an item 
for a given test. The purpose of this paper is-to compare the relative 
merits of these three methods. 

There is one difficulty which is inherent in the Criterion of Internal 
Consistency and is common to all the methods. It results from the 
fact that the criterion itself includes the item under “analysis. This 
brings about a spurious relationship~ between item and criterion. 


The method of removing this spuriousness will be indicated for each 
method. 





1 The author is grateful to Professor Truman L. Kelley for reading parts of the 
manuscript and to Dr. E. E. Cureton and Dr. Jack W. Dunlap for offering many 
valuable suggestions. 

* The size of the index of association, whether it be x?, the critical ratio, or some 
other index, is no indication of the degree of association. It merely indicates how 
certain the existence of the association is. 
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CRITICAL RATIO METHOD? 


This method consists of a comparison between the mean in total 
score of those who answered the item “‘correctly” and the mean of 
those who failed to answer it ‘‘correctly.”” If the mean of the “‘suc- 
cesses” is significantly larger than that of the “‘failures,” the item is 
retained. If the difference is not reliable, the item is excluded, and 
if the difference is reliably negative, the item itself or the manner of 
its scoring is revised so as to eliminate the negative relationship with 
the total score. Since the item in question is included in the total 
score, the mean of the “‘successes”’ is necessarily greater than it would 
have been if the item were not included. This spurious increase in 
mean may cause a reliable difference to appear when in reality no 
such difference exists. 

Let the subscript q represent the statistics of the ‘‘successes”’ 
and the subscript p, the statistics of the “failures.” Then the critical 
ratio for the difference between M, and M, is 


M, er M> 
V0 yqut o yp) 





Ca 





Letting a prime represent the statistics when the item under 
analysis, y, is excluded from the total score‘ 


eo M,’ oteN M,’ 
V 07 49 + op 


The removal of item y from the total score will affect the total 
score of the ‘‘successes” only, since only the ‘‘successes”’ were credited 
with it. 

Hence 





Cc’ 





M,’ = M,-—1andM,’ = M, 


The standard deviation of the ‘‘failures” o,, remains unchanged, 
since the scores of the “failures” remain unchanged. The standard 





’A complete discussion of this method will be presented in a forthcoming 
paper: ‘‘The Correlation Coefficient versus the Critical Ratio as Methods of 
Expressing Relationship” by Joseph Zubin and J. B. Maller. 

4The correlation between M, and M, in random samples is equal to zero, as 
shown in “Standard Errors of Statistics of Frequency Distributions” by J. B. 
Maller and Joseph Zubin, to be published. 
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deviation of the “‘successes” also remains unchanged, since each if sd 
“success” was increased by a constant, unity, and adding a constant K al 
to a series of scores does not alter their standard deviation. gy i 
Hence, 9 i 
omp = Cup aNd oug = Ouy | ae 
‘ a Ps, a4 hs | 
hig = M, M, 1 = C _ 1 (I) ; 4) fs 
V 0%u¢ + oup V 0%u¢ + ou» * f ks 
Even if C’ were zero, a reliable difference could be obtained if the sa at 
sigma of the difference were equal to .3 or less, which is not unusual. 4 Fy 
This spuriousness may be readily eliminated by reducing the difference a i 3 
in the means by unity. 4 He 
COMPUTATION OF THE CRITICAL RATIO te 64 
bia |e 
The computation of the critical ratio may be facilitated in the iM is 
following manner. Since the mean of the total scores, M, and its $ 
sigma, o, is constant for all the items, only the mean and sigma in “ a 
total score of the ‘“‘successes” for each item need be computed. The Pa} 
mean and sigma in total scores of the “failures” can be obtained as ee 
follows: P LAe' 
pM, + qM, = M.. ¥ 
Hence . Ih ‘' 
M,. — M Be 
M, = a; p+qe=l. ae 
Pp i 4} ; 
The difference between the means can then be obtained as follows: he ¥: : 
M, —- M,= ae (II) i i 
t a ¥ 
The standard deviation of the “failures,” ¢, can be obtained as ay y 
follows: a i 
oz? = plop? + d,?] + glo,” + d,?] . ce 
where Rite 
d, = M, — Mzandd,=M,-M., Pe 
Hence Lt a 
44 * | 
dy — dp = M, — My and dy = 7d ne 
Substituting 
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2 
og,” = po,® + ae + glo? + d,’] 


1 2 
o,? = ye ee (< r tae 


1 
= pire — we — Sat 


and 


and the critical ratio when the item under analysis is excluded from 
the total score, is 





(IIT) 








P| 22 4+? - To| — qd,” 


Hence, in order to apply the critical ratio method to the validation 
of test items,.we need to compute separately for each item only two 
measures: The mean and the standard deviation of the ‘‘successes.” 
The critical ratio for an item can then be determined by substituting 
these values in equation (III). If the assumption of normality 
for the dichotomous variable (item) is warranted and if the regression 
is linear and homoscedastic,® the critical ratio obtained in the above 
manner can be converted into a correlation coefficient,? equivalent 
to the bi-serial. 


BI-SERIAL CORRELATION METHOD 


A second method for determining the validity of an item is to 
compute only the means in total score of the ‘‘successes” and of the 
“failures” and then to determine the bi-serial correlation between the 
item and the total score by means of the formula 


— Ma — Mp pg 


Oz z 





Voy 


where M, and M, are, as in the previous section, the means in total 
score (including the item y whose correlation with the total is sought) 





5 A partial check on the homoscedasticity of the distribution may be applied 
by noting whether o, = o, within the limits of sampling errors (suggested by Dr. 
E. E. Cureton). 
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of the ‘‘successes” and the “failures,” respectively, p = proportion 
of ‘failures,’ ¢ = proportion of “‘successes,” z is the ordinate at the 
point of dichotomy in y in a unit normal distribution, and c, is the 
standard deviation of total scores.6 Again letting the primed sub- 
scripts represent the statistics of the distribution not including the 
M,' — My’ pq. 


ox’ z 





item under analysis, y, rzy = 


now 
M,' —M,’=M,-—M,-1 


as shown previously 


Now 





CG,’ = 0 (z—y) = Voz" + oy” —s QT sy Fy 


Under the assumptions required for the bi-serial r, the standard devia- 
tion of the item is unity, and since 


(M, - M,)#S = Trax 











Ts'y = toy ” it 
or 
Te'y = : TeyF2 — Pa (IV) 
Vo2 +1 — 2riys 2 


Hence even when no relationship whatsoever exists between 
x’ and y (r,, = 0), the correlation between the item and the total 
including the item, will not vanish, but will be equal to om - The 
maximum value that a can have is about .6. Hence, even when no 


\ 


relationship exists between the item and the “‘true” criterion, the 
apparent correlation due to the inclusion of the item under analysis 
in the total score can be as high as e. 


For computation purposes r,y may be rewritten 





* The symbols are those utilized by Kelley in Statistical Method, Macmillan, 
1924, pp. 350-351. 
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q M,—M:.-—p 
Ts'y = 2 oe" +1—2(M, — matt 








The bi-serial correlation is of value only when the assumption 
of normality in the dichotomous variable is warranted. If the 
assumption of normality is not made, and the individuals falling in 
each of the two parts of the dichotomy are regarded as having identical 
scores, (1 for the successes and O for the failures) then’ the correlation 
coefficient for this two point distribution may be written— 


, _M,—M, /1-0_M,-M, 
(oa © ee 








This coefficient r’ bears the same relationship to the bi-serial 
coefficient, r, that Yule’s four-fold r bears to the tetrachoric r. If we 
correct the equation for the spuriousness involved in correlating an 
item with the-total including the item, 


Vy eS, fe 1 22 — V pg 
zy o:' Pq Vox" + pq aie 21’ wy 27 ‘pq 
This can be arrived at more directly by noting that® 


1 rz — V pg 


O(2z-y) 











, nae a 
T s'y = Tl (c—py = 





where y is the score on the item under analysis. 
For computation purposes, 


, [M, — M. — p|V4q/p 


T 2'y = 
ee _ 2| Me — M,- 5 (2 


COMPUTATION OF BI-SERIAL fr 








The following procedure will facilitate the computation of the 
bi-serial r’s: 





7 Compare Richardson, H. W. and Stalnaker, John M.: “A Note on the Use of 
Bi-Serial R in Test Research.” Journal of General Psychology, Vol. VIII, No. 2; 
1933, pp. 463-464. 

8 Suggested by Mr. Max Smith. 
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1. Divide the total score of each individual by o, and call the result 
t (sigma scores). 
2. Add together the ¢ scores for all the successes to obtain 
M 
M, = = 
The bi-serial r between an item and the total score including the 
item is 3 
= a. and ——— ° Pq = os qg 
Voy p ; (m, — mz) s 
Let the values of (m, — m,) be designated by d. We can prepare a 
table for d and g/z and thus enable the determination of r with very 
little effort. Since the value of q/z is fixed for any value of g, Table I 
gives the values of q instead of q/z. The table is used as follows: 
If gq = .30 and d = .70, look in the column g = .30 until the value of d 
approximating .70 is found—and read off the value of r on the left 
hand margin —.60. If the nearest tabled value of d is not close 
enough, interpolation may be resorted to. After the value of r is 
obtained, it can be readily corrected for the spuriousness discussed 
above by means of equation (IV). 


TaBLeE I.—B1-Seriat r FoR GIVEN PERCENTAGE OF “Successes” AND GIVEN 
STANDARD DIFFERENCE BETWEEN MEAN oF “‘Successes”’ AND TOTAL MEAN 


(q) = Percentage of ‘‘Successes”’ 





"= 


0.001 (0.05 (0.10 (0.15 |0.20 (0.25 (0.30 (0.35 (0.40 (0.45 |0.50 





0. 3367/0. 2063/0. 1755/0. 1554/0. 1400 
0. 6734)|0.4125)0.3510|0.3109/0. 2800 
1.0101/0.6188)0. 5264/0. 4663)0.4200 
1.3468)/0. 8251/0. 7020|0.6218)/0. 5600 
1.6835) 1 .0314|0.8775|0.7772/0. 7000 
2.0202}1 . 2376) 1 .0530/0 . 9326/0. 8399 
2.3569) 1. 4439)1 . 2285/1 .0881/0.9799 
2.6936) 1. 6502)1 . 4040) 1. 2435/1. 1195 
3.0303) 1. 8564) 1. 5795/1. 3989/1. 2598 
3. 3670/2 .0627)1.7550)1. 5544/1. 3998 


.1271)0. 1159/0. 1058)0.966 |0.0880)0.0798 
2542/0. 2318/0. 2117/0. 1932/0. 1759/0. 1596 
.3813)|0.3477|0.3175)0. 2898/0. 2639/0. 2304 
. 5084/0 . 4636/0 . 4233/0. 3863/0. 3518)0.3192 
. 6355/0. 5795/0. 5291/0. 4829)0. 4398/0. 3989 
7627/0. 6954/0. 6350|0.5795)0. 5277/0. 4787 
8898/0. 8113/0. 7408/0. 6761/0. 6157/0. 5585 
.0169)}0. 9272/0. 8466/0. 7726/0. 7037/0. 6383 
. 1439) 1.0431/0.9525)0. 8693/0. 7916)0.7181 
. 2711/1. 1590) 1. 0583/0. 9659)0 .8796|0. 7979 
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ASSOCIATION METHOD 


The association method consists of comparing the proportion 
of “‘successes” among the individuals whose total scores lie above the 
median with the proportion of “successes” lying below the median. 
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This may be done either by computing x? for the four-fold table given 
below or by calculating the critical ratio for the difference in proportion 
of ‘‘successes” lying above and below the median respectively.® 





**Successes”’ | *‘ Failures” 
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This comparison involves two difficulties. First, since success 
in the item raises the total score, there may be some individuals 
whose scores would drop below the median if the item under analysis 
were omitted from the total score. Secondly, the median of the total 
scores is not a good reference point for all the items. The selection 
of the median as a reference point makes the process of item analysis 
rather simple and easy, but it tends to obscure some relationships and 
to enhance or create others. If we select an easy item, one that was 
passed by 85 individuals out of a total of one hundred, we find, if the 
item is a good one, that the ‘‘successes” tend to have higher scores 
than the “‘failures.”” Letting five of the high scoring individuals fail 
by chance and five of the low scoring individuals succeed, we should 
find eighty “‘successes” among the high scorers and five among the low 
scorers. But not all the eighty high scorers can fall above the median. 
Some of these must fall below the median. Table A shows the four- 
fold table for this case. 








TaBLe A 
**Successes”’ | “‘ Failures” 
ai ge Gl in Ae 45 5 50 
NE I eae ne 40 10 50 
85 15 100 














x? for this table is 1.9 and the critical ratio of the difference of the 
proportions of ‘‘successes”’ lying above and below the median, respec- 
tively, is 2.0. If instead of the median, the 85th percentile were taken 
as the point of reference, Table B would result. 





® The equivalence of these two methods is proved by Professor Harold Hotelling 
in a forthcoming publication. 














Method of Internal Consistency 353 
TaBLe B 
“‘Successes”’ | “‘ Failures” 
Above reference point.................0c000. 80 5 85 
Below reference point...................000. 5 10 15 
85 15 100 














x? for this table is thirty-six and the critical ratio for the proportion 
of “‘successes” is 11.5. It is apparent that the median is not a good 
reference point for all the items. The percentile in total score cor- 
responding to the percentage of ‘‘successes” is better. The difficulty 
that was pointed out above will hold true equally well of both easy and 
difficult items. For, by interchanging the columns and the rows of 
the four-fold table, the item can be changed from a difficult one into 
an easy one, but the x? value and the critical ratio remain unaltered. 

In some instances the selection of the median as a point of reference 
results in the creation of spurious relationship. This can be seen 
most readily when we consider a case in which the items are responded 
to by chance—as for example, if a list of nonsense syllables are to be 
matched with right or wrong.’® The distribution of items by per- 
centage passing (distribution of difficulty) will be normal and the 
distribution of total scores will be similarly normal. For any given 
item, the proportion of “successes” among the high scorers (lying 
above the median in total score) may be equal to the proportion of 
successes among the low scorers (lying below the median). But this 
can not hold true for all the items. Some items will be found whose 
proportion of successes among high scorers will be higher than the 
proportion of successes among the low scorers. Otherwise, average 
total score of those lying*below the median will be equal to the average 
total score of those lying above the median, which is impossible. This 
situation is aggravated even further when instead of considering the 
median in total score as the point of reference, comparisons are made 
between the upper and lower quartile or between the highest and 
lowest decile in total score. This initial selection tends to create 
a correlation between the score on the item and the total score of an 
individual. In order to remove this spuriousness, it is better to take 
as the point of reference either the percentile in total score correspond- 
ing to the percentage of ‘successes,’ the median score of all the 





© Suggested by Dr. J. B. Maller. 
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“successes” in the item in question, or the point in total score cor- 
responding to the threshold of correctness in the given item.'' After 
choosing the reference point, x? can be computed for the four-fold 
table. 





‘*Successes”’ | ‘‘ Failures” 





Above reference point...................... a b 
Below reference point.................-..05- c d 
q p 














If the assumption of continuity in the dichotomous variable is not 
warranted, the four-fold r can be obtained, and if normality can be 
assumed, the tetrachoric r can be computed.” 


RELATIVE EFFICIENCY OF THE THREE METHODS 


Before comparing the efficiency of the three statistics—Critical 
Ratio, Bi-serial r, and x?, it should be pointed out that measures of 
relationship fall into two major divisions—measures of the probability 
of association and measures of the degree of association. Measures 
of the first class indicate only whether or not an item is at all related 
to the criterion. Measures of the degree of association indicate in 
addition the relative importance of the item and can be utilized as 
weights. 

When 7;, the bi-serial correlation coefficient, is used as a measure 
of probability of association, it must be divided by its standard error. 
The critical ratio of the difference between the mean of successes and 


the mean of failures, C., is always larger than {= r) when x? > 


r 


1 — r?, x being the sigma deviate of the point of truncation of the 





11 This method is borrowed from psycho-physics and is used in threshold 
determination. Line has applied it as follows: The total scores are ranked in order 
of size and the point at which the number passing the given item equals the number 
failing, is selected as the reference point. See Line, W.: ‘‘The Growth of Visual 
Perception in Children.” Brit. J. Psy. Mon. Supp., 1931, No. 15. Another 
way of selecting this point is to obtain the median of the range in which passes 
and failures alternate. 

12 The computation of the tetrachoric r is simplified by the Alice Lee Tables 
in Biometrika; and diagrams for computing the tetrachoric r are given by Chesire, 
L. Saffir, M. and Thurstone, L. L.: ‘‘Computing Diagrams for the Tetrachoric 
Correlation Coefficient.”” University of Chicago Bookstore, 1933. 
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dichotomous variable in a normal distribution.'"* Hence C, is more 
efficient than C, under the above conditions.“ 

It must be remembered that r, requires the assumption of normality 
in the distribution of the item. Since C, requires no such assumption 
it is to be preferred on this additional account also. Furthermore, 
if the assumption of normality is warranted, C, can be transmuted 
into 7 by means of the equation 


1 
~ 1/C,2 -1/C2 +1 


where C; is the critical ratio of the difference between the means of the 
“successes” and “‘failures” in the unit normal distribution of the item 
and C2 is the unit critical ratio of the means of the ‘‘successes” and 
“failures” in total score (equal to the regular critical ratio divided 
by ~N). Tables giving the value of r for each value of C; and C, 
have been provided.'® 

The association method makes the fewest number of assumptions. 
It does not, however, provide a direct measure of degree of association 
although such a measure is available in the tetrachoric r. But both 
x? and the tetrachoric r are perforce less efficient than the critical ratio 
and the bi-serial r since the latter two methods take into consideration 
the gradations in one of the variables, while the x? and tetrachoric r 
consider both variables as dichotomous. 


Tp” 





SUMMARY 


Three methods for internal validation of test items have been 
reviewed—the critical ratio, bi-serial, and association methods. The 
spuriousness due to the customary practice of including the item under 
analysis in the total score was pointed out and formulae for correction 
were provided. In general, the association method, in which the 
percentages of ‘‘ passes” lying above the median in total score becomes 
the measure of validity, is the easiest to apply but it is subject to 
serious limitations. Its modified form in which a reference point 
must be found for each item is superior but much the most difficult 
of allto compute. Next in order of ease of computation are the critical 
ratio, and the bi-serial r. The bi-serial r method is subject to the 





13 See footnote 3. 


4 The true distributions of C, and C, are not available, and the tacit assumption 
is made that when N is large they approach normality. 
15 See footnote 3. 





356 The Journal of Educational Psychology 


assumption of normality in the dichotomous variable, but a coefficient 
not involving this assumption, 7r,’, can be used. The critical ratio 
method is subject to no assumptions except those underlying the 
derivation of the standard error of a mean and is, therefore, the most 
useful measure. If the assumptions of linearity and homoscedasticity 
are warranted, the critical ratio can be transmuted directly into the 
bi-serial correlation coefficient. 
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THE RELATION BETWEEN RATE OF READING AND 
SPEED OF ASSOCIATION 


ARTHUR E. TRAXLER 
University High School, University of Chicago 


It has been known for many years that people read silently at 
unequal rates. During the last two decades there have been many 
studies of the factors influencing rate of reading. Several of these 
factors relate to the conditions under which reading is done. Some 
of them are the attitude of the reader, the purposes of the reader, 
the type of reading (silent or oral), the degree of concentration of 
attention, the nature and difficulty of the reading material, and the 
amount of practice in rapid silent reading. Some other factors which 
have been found to influence rate in considerable degree are general 
intelligence, breadth of reading vocabulary, and skill in the mechanical 
aspects of reading. 

Another factor which some psychologists believe to be a cause 
of differences in reading rate is speed of association of ideas. It 
has been observed in a general way that slow thinkers tend to be slow 
readers even when their general ability, the mechanical features of 
their reading, and the conditions under which they read are satis- 
factory. There is, however, little objective evidence on the relation of 
rate of association of ideas and rate of reading. 


THE PROBLEM OF THIS STUDY 


The purpose of this article is to report and attempt to study 
objectively the relationship between rate of reading and speed of 
association. The question may be stated as follows: Do some people 
read rapidly partly because they are able to associate ideas quickly 
with the words of the printed page, while others read slowly in part 
because they are essentially slow in their thought processes? 

The question is of considerable practical significance. Certain 
standards, based on averages, have been set up for reading rate. 
Many schools carry on programs of remedial reading, one of the 
purposes of which is to bring slow readers up to standard in rate of 
reading. The question is raised in this article as to whether effort 





1 The writer is indebted to Professor Frank N. Freeman for pointing out the 


need for objective study of the problem and for counsel concerning methods of 
attack. 
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in that direction may not be limited in the case of certain individuals 
by the general rate at which their association of ideas takes place. 
Are people destined by differences in rate of association of ideas to 
vary considerably in the speed at which they can and do read? 


THE PROCEDURE USED IN THE STUDY 


The General Plan.—The problem was attacked by correlating rate 
of association with rate of reading. Scores on rate of reading were 
available in abundance, but it was necessary to gather data on associ- 
ation time. For this purpose, a testing situation was devised in 
which the results were free, as far as possible, from the influence of 
any factors other than rate of association. 

An attempt to relate reaction time on a test of this kind to rate 
of reading involves an important assumption. It is assumed that 
each individual has a general association rate which influences alike 
his rate of making meaningful reactions to continuous written dis- 
course and his speed of reacting to stimuli that are not in contest, and 
that by measuring this general factor as it is manifested in the latter 
instance, one can secure data which will help to show whether it is 
related to rate of reading in the former situation. It seems logical 
to assume that there is probably no qualitative difference in the 
association time involved in the two situations. 

The following criteria were observed in the construction of the 
association tests: 

1. The initial technique may be rather crude. It will be sufficient 
to indicate general trends. The technique may be refined later, if the 
preliminary study indicates that the problem is worth pursuing 
further. 

2. The vocabulary used in the tests should be easy, in order that 
the influence of general intelligence and knowledge of word meaning 
will be minimized. 

3. It is preferable to present the stimuli to the subject visually 
rather than orally, as this method affords less chance of misunderstand- 
ing them. This is particularly true of homonyms. 

4. A series of single stimulus words rather than phrases or 
sentences should be used, as the latter kind of stimulus would involve 
eye-movements. 

5. The number of words included in the tests should be large 


enough to permit a considerable range.in the time required for various 
individuals to complete the series. 
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6. In order to develop a technique which is transferable to the 
average school situation, it is permissible to keep a record of the total 
time of each individual rather than the time of single reactions, as this 
can be done with simpler measuring instruments. 

7. The test should contain no words which may set up inhibitions 
because of their peculiar nature. 

8. As the best type of test for determining rate of association is 
not known, it is desirable to experiment with more than one type. 

9. Two forms of each test should be devised in order that the 
reliability of the test can be computed. 

10. The technique of measuring rate of association should be refined 
and improved later on the basis of the experience that is gained in the 
preliminary experiment. 

The Types of Association Tests—The tests devised for the first 
experimentation were a free-association test and a controlled-associa- 
tion test. Each form of the free-association test consisted of twenty- 
three words chosen from the three thousand most common words in 
Thorndike’s The Teacher’s Word Book. As the time used by the 
subjects in reacting to the words in the free-association test varied 
greatly, twenty-three words were enough to give a wide range in the 
total time of the various individuals. Care was used to avoid including 
words that might set up inhibitions in the responses. 

The words in the free-association test were shown to the subject 
one at a time by means of a tachistoscope operated by the examiner. 
In the administration of the test the subject was seated facing the 
tachistoscope and the following directions were given: 


Words will be shown to you one at a time through the opening. As soon as you 
see a word, say the first thing the word makes you think of. For example, pencil 


might bring to your mind pen, or paper, or write, or any of several other words. 
Do the test as quickly as you can. 


When it was apparent that the subject understood what was 
expected of him, the examiner exposed a word. As soon as the 
subject responded, the examiner closed an electric switch, causing 
the tachistoscope to show the next word. By this technique and 
the one used with the controlled-association test, the reaction time 
of the examiner as well as that of the subjects was involved in the 
results. However, since one individual administered all the tests, 
it can be assumed that the reaction time of the examiner was prac- 
tically a constant factor. 
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The controlled-association test consisted of a test of easy antonyms. 
Two forms of fifty words each were devised. With three exceptions, 
the words in the second form consisted of the opposites of those in the 
first form, but the order of appearance was different. In the con- 
struction of the test, a somewhat longer list of words was prepared 
and was tried out with a few subjects. The words which caused an 
unusually long pause before the response were considered too hard 
and were eliminated. 

During the first experiment in which the controlled-association 
test was used, the words were shown to the subject with a crude type 
of tachistoscope which required considerable skill in operation. After 
the preliminary experiment had been carried on a new and improved 
device for exposing the words was made. The new tachistoscope 
consisted of a small, portable, wooden box, in one side of which a 
small opening was cut. The box was fitted to a detachable base and 
was so arranged that it could be set up on.a table with the slit about 
on a horizontal line with the subject’s eyes. The one hundred words 
in the test were printed on a strip of motion-picture film. The film 
was inserted lengthwise through the box in such a position that the 
words were exposed, one at a time, through the slit. A simple arrange- 
ment of cogwheels and a lever made it possible for the examiner to 
show the words rapidly and accurately. 

When the test was administered, the subject sat at a table facing 
the apparatus, which stood on the table at a distance of about eighteen 
inches from his eyes. The examiner sat at one side in such a position 
that he could manipulate the apparatus and could observe the subject 
and read the words which were exposed through the opening. He 
operated the lever which controlled the movable film with his left 
hand and held a stop-watch in his right hand. The directions given 
to the subject were as follows: 


I have a list of words here and I want you to tell me the opposites of them. I 
will show a word here (pointing to the slit in the box) and you are to say the oppo- 
site as quickly as you can. Then I will show another word and you will say the 
opposite of that as quickly as possible, and so on through the list. If you don’t 
think of the opposite word rather quickly just say ‘‘don’t remember” and we will 
go on to the next word.! 





1 The last sentence was necessary because it was found that when it was not 
included in the directions some children who had difficulty in recalling the opposites 
of two or three words spent so much time in trying to think of them that their 
total time was increased greatly. Thus, it was possible for an individual who was 
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The Subjects—In the first experiments, each of the association 
tests was administered to fifty seventh-grade pupils of the University 
of Chicago High School and to twenty-seven students in the Freshman 
class of the University of Chicago, during the Winter Quarter, 1931. 
All the university students were tested with both tests. Thirty-one 
of the seventh-grade pupils took both tests. 

The revised test of antonyms was given to sixty-seven ninth-grade 
pupils in the Spring Quarter, 1931, to forty:seventh-grade pupils in 
the Autumn Quarter, 1931, and to fifty-one seventh-grade pupils 
in the Spring Quarter, 1933. All the subjects were pupils in the 
University of Chicago High School. 

Rate of Reading Scores.—The rate of reading of all groups except 
the last was measured in words read per second in continuous material 
by means of a test constructed by the writer. The last group was 


tested for rate of reading with the Monroe Standardized Silent Reading 
Test, Test II. 


RELIABILITY OF THE ASSOCIATION TESTS 


The total time required by each subject to complete both forms 
of each association test was used in the study of the correlation of 
association time with rate of reading. The total time used by a 
pupil on the association test included the time of recognizing and 
vocalizing the response as well as the time of associating the word 
with another word. It might be thought that variations in the rate 
of recognizing the words and in the rate of vocalization would intro- 
duce a spurious factor that would tend to invalidate the scores on the 
test. Difference in rate of vocalizing would obviously be an important 
factor if some of the subjects had speech defects. With normal 
subjects, however, recognition and vocalization speed seem to have 
little influence on the relative standing of the pupils. Data on this 
point were secured by having twenty seventh-grade pupils go through 
the list of words used in the test of antonyms twice. During the first 
trial they were required to say each word as quickly as possible after 
it was displayed. On the second trial they were instructed to respond 
with the opposites in the manner already described. The time taken 


by each pupil on the first trial was then subtracted from his time on 
the second trial. 





really rapid in his associations to make a poor showing in total time because of 
deficiencies in vocabulary. 
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It is reasonable to assume that the time of completing the first 
trial included the subject’s rate of recognition and rate of vocalization 
and the examiner’s rate of operating the apparatus and that the total 
time of completing the second trial included these three factors plus 
rate of associating the opposites with the words. Therefore, when the 
time of doing the first trial was subtracted from the time of the second 
trial, the residue was the time used in making the associations between 
the words and their opposites. 

The corrected association scores, secured by subtracting the time 
of trial 1 from the time of trial 2, were correlated with the scores on 
trial 2—that is, they were correlated with the total time used by each 
pupil on the opposites test. The correlation was r = .968 + .010. 
This correlation is so high that it appears that differences in rate of 
word recognition and rate of vocalization had a negligible effect on 
the association scores. 

The reliability of the total score on the association tests was found 
by computing the correlation between the two forms and applying 
the Spearman-Brown formula for predicting the reliability of length- 
ened tests. The correlations are shown in Table I. 


Tasue I.—Tue RELIABILITY OF THE ForTy-six-worp ForM OF THE FREE- 
ASSOCIATION TEST AND THE ONE HUNDRED-WoRD FoRM OF THE TEST OF 











ANTONYMS 
Test Seventh- Ninth- University 
grade pupils | grade pupils freshmen 
Free-association test.............. [og kU ea PR .895 + .026 
Antonyms (original test).......... . 5  . 2 Ceres a .963 + .010 
Antonyms (revised test).......... .973 + .006 | .887 + .018 











All the correlations shown in Table I are above .88 and three of 
them are above .95. They indicate that those abilities which con- 
tributed to the association-test scores were measured very consistently. 


CORRELATION OF READING RATE WITH SPEED OF ASSOCIATION 


The rate of performance on the free-association test, the first 
opposites test, and the revised test of opposites was correlated with 
rate of reading for the students in each of the groups who took the 
tests. The coefficients of correlation are shown in Table II. 

All seven correlations between the association tests and the rate of 
reading tests are significant. They suggest that association time is 4 
noteworthy factor in reading rate. 
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TaBLE II.—CorRRELATION OF SPEED OF ASSOCIATION WITH RATE oF READING 
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Reading rate | Reading rate | Reading rate 
Test seventh- e| ninth-grade university 
pupils pupils freshmen 
Free-association test.............. Me te MID 4a oc ceca ccs .487 + .100 
Antonyms (Spring, 1931).......... SE, PEE © .w vinscccwned .565 + .091 
Antonyms (Fall, 1931)............ .558 + .074 | .392 + .071 
Antonyms (Spring, 1933).......... .448 + .076 














However, it was found that both rate of reading and rate of asso- 
ciation were correlated positively with intelligence as measured by a 
timed group test.' This fact raised a question as to whether or not 
rate of association and rate of reading gained their apparent correlation 
from the influence of intelligence. The question was investigated 
by the method of partial correlation, with intelligence held constant. 
The use of the method in this case may be criticized because it par- 
tialed out too much, since intelligence, if it is defined as the score made 
on a mental test within a time limit, is to some degree a function of 
rate of association and rate of reading. In other words, not only does 
intelligence influence the two rate factors, but they in turn influence 
the intelligence score. 

A certain value, however, will be derived from computing a partial 
correlation of this kind, if it is kept in mind that too much is taken 
out by the method. If the coefficient is so low that it is insignificant, 
this will not prove conclusively that there is no direct relationship 
between rate of association and rate of reading. If, however, there is 
significant correlation despite the fact that too much has been taken 
out, this will constitute evidence that speed of association is connected 
with reading rate directly and not by virtue of the relationship of both 
factors to intelligence. The partial correlation of rate on the associa- 
tion tests with rate of reading, with intelligence held constant, is 
shown in Table III. 

All the correlation coefficients were considerably reduced by partial- 
ing out intelligence. The amount of reduction for the seventh-grade 
pupils was .125, .106, .319, and .167; for the ninth-grade pupils, .046; 
and for the University Freshmen, .156 and .294. Nevertheless, three 





‘The Otis Self-administering Test of Mental Ability, Higher Examination 
was used with the high school pupils and the Thurstone Psychological Examina- 
tion was used with the University Freshmen. 
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TaB.Le III.—Partiat CoRRELATION OF SPEED OF ASSOCIATION WITH RATE oF 
READING WHEN INTELLIGENCE Is Hetp ConsTANT 

















Reading rate | Reading rate | Reading rate 
Association test seventh-grade| ninth-grade | University 
pupils pupils Freshmen 
Free-association test.............. MOR & GOB 6. ckdiveck .331 + .120 
Antonyms (Spring, 1931)......... Se f Se \ orem ine .271 + .122 
Antonyms (Fall, 1931)............ .239 + .100 | .346 + .073 
Antonyms (Spring, 1933).......... .281 + .087 





of the correlation coefficients, (.403, .421, and .346) were certainly 
statistically significant; one (.281) was probably significant; and the 
remaining three (.239, .331, and .271) may be regarded as somewhat 
significant, although the evidence of relationship which they offer is 
slight, since none of them was as much as three times its probable 
error. In general, when all seven partial correlations are considered 
together, it appears that speed of association has some relationship 
to rate of reading that is independent of other aspects of mental 
ability which are measured by an intelligence test. 

The influence of vocabulary on the correlation between rate of 
reading and speed of association was investigated in the same way. 
The seventh-grade groups, only, were used in this part of the study. 
The first three groups were tested with the Inglis Test of English 
Vocabulary, which contains one hundred fifty words and is not timed. 
The last group was tested with a non-standardized vocabulary test 
of fifty words, which all the pupils were allowed to finish. As neither 
vocabulary test was timed, presumably the scores were not affected 
by the rate at which the various individuals worked. Therefore, the 
correlation between rate of association and reading rate, with vocabu- 
lary held constant, should approximate the true relationship between 
these abilities among individuals who possess equivalent knowledge 
of word meaning. The partial correlation of each of the association 
tests and rate of reading, when vocabulary was held constant, is shown 
in Table IV. 

The amounts of reduction in the correlation between the associa- 
tion tests and rate of reading that resulted from partialing out vocabu- 
lary were .103, .045, .207, and .114, respectively. Notwithstanding 
these reductions, two of the correlations were more than four times 
the probable error and the other two were approximately four times 
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TasBLE IV.—PartTIAL CORRELATION OF SPEED oF ASSOCIATION WITH RATE OF 


READING WHEN VOCABULARY Is HELD ConsTANT 











Tests correlated r, PE 
Free-association test and rate of reading..................... .482 + .078 
Opposites test and rate of reading (Spring, 1931).............. .425 + .072 
Opposites test and rate of reading (Fall, 1931)................ .351 + .093 
Opposites test and rate of reading (Spring, 1933).............. .334 + .084 





the probable error. The correlations indicate that there is a relation 
between rate of reading and association time which is independent of 


vocabulary. 


SUMMARY 


Evidence of a direct relation between rate of reading and speed of 
association was found with five groups of pupils. The effect of eye- 
movement was eliminated by the conditions of the study, and the 
influence of mental ability and knowledge of vocabulary was taken 
out by partial correlation; nevertheless, the evidence of relationship 
persisted. This relationship, while it was not marked, was positive 
and was high enough to be significant. In the case of some pupils, 
there seems to be little or no correlation between the rate at which - 
they associate ideas with words and the rate at which they read; with 
other pupils the relationship is apparently very close. ‘There is 
ground for thinking that slow association rate may be so closely 
related to the retarded reading rate of some slow readers that the 
teacher should not utilize the usual methods to get them to read more 
rapidly. The problem needs further study with a refined technique. 
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HOTELLING’S METHOD MODIFIED TO GIVE 
SPEARMAN’S g 


GODFREY H. THOMSON 
Moray House, University of Edinburgh 


The object of the present paper is to state a modification of Hotell- 
ing’s process which, instead of taking out the “largest principal 
component” from a perfect hierarchy will take out Spearman’s g; 
and to make some remarks on the meaning of the two processes which 
are suggested by the comparison. 

Hotelling’s process refers the correlations between n tests to n 
components, equal in number to the tests (though some of them may 
be negligible). The first component y; is that whose contribution 
to the total variances of the test-scores is as large as possible, and the 
second and succeeding components make smaller and smaller con- 
tributions to the total variance n. 

Spearman refers any set of “hierarchical” correlations, between n 
tests which have tetrad differences not significantly different from 
zero, to » + 1 components (one more than Hotelling) namely one 
g and n specifics. His tests are represented, not in n-fold space (as 
with Hotelling) but in (nm + 1)-fold space, and in this hyperspace 
each line representing a test is in one of the n orthogonal planes which 
have the g line as a common axis and pass each through one of the 
n other axes—each test depends upon g and upon its own particular 
sand no other. If Hotelling’s process is applied to a set of hierarchical 
correlations it extracts a principal component which accounts for more 
of the total variance n than does Spearman’s g,' and its later com- 
ponents are “‘interference factors’ which help in some, hinder in other 
of the tests. For example if applied to the perfect hierarchy 


Zi Z2 Z3 Mo Zs 
ma (1. .669 .592 .458 #£.251 
Zz, .669 1. .566 .488 .240 
z; .592 «.566 1. .387 .212 
az .458 .4388 #£.387 «1. . 164 


% .261 .240 .212 .164 1. 


Hotelling’s process gives for the five principal components the 
coefficients: 





1 Moreover the Hotelling principal component, unlike g, changes if another test 
is added to the hierarchy. 
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v1 v2 v3 v4 vs | Spearman’s g 
) OR Pry es Syren er 2.683) 0.890) 0.652) 0.448) 0.328 
Percentage of total variance...| 53.7 | 17.8 | 13.0} 9.0| 6.6 44.6 
Dis Kasdhdes Kocncnunsowawets .856 | — .092| — .152) — .217| — .436 .837 
Bis cceséeavdesiauresyehbeane .840 | — .098) —.173)— .346| .365 .800 
Oe nctienddépadates tod eee .790 |—.116|—.294) .523) .060 . 707 
Bis'.s sewRRE bbb wae wheeled ee .673 |—.182] .713) .083) .020 . 548 
Se. ih cvubbcanbncacet emma .413 .908; .069) .022) .009 . 300 























On the right of the table are given the coefficients of g in the 
Spearman equations from which the hierarchy of correlation coef- 
ficients was constructed. They are all smaller than the coefficients 
of Hotelling’s y:, and g only takes out 44.6 per cent of the variance 5, 
against 53.7 per cent taken out by 71. 
a specific factor inasmuch as it has an outstandingly large coefficient 
in one variate. But the later components are more and more strongly 
interference factors, helping in some and hindering in other variates; 
until ys is almost equally helping zx. and hindering z, (or vice versa, 
for the signs can be reversed). 


As one is performing calculations like the above, one becomes 
aware that the influence of the units in the diagonal is the cause of 
the high coefficients of y:1. 1 therefore asked myself what quantities 
placed in the diagonals of the hierarchical matrix of correlations would 


Hotelling’s 7; approximates to 
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cause Hotelling’s form of calculation to give a first component (no : 4 
longer a principal component in his sense) which was identical with g: Ag 
and found that these are the squares of the correlation of each variate i 
with g. If each variate is of the form if 
te = lg + (1 — i?) As, ) ih 
the matrix of correlations, with J,” in the diagonal cells, is ae i 
Z1 Z2 Z3 7 Tn fi f 
Zi | Ll. Ll; Lil, Liln . : 
Zo Lely 12* els lale loln bi 
ws Ushi Isla Us? Isle Isln 1 B 
we Ua Udde lds We . Un ie 
A ios ie 
Zn Unli Unle Lnls Lala t,* TS ah 
and if the rows of this matrix are multiplied by the multipliers a 
roe 
l; Wl ln 
1, le 3 Ps * as ; 
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and the columns then added to obtain the totals t,, te, ts, ta... 
t, we find 


t, = Zl? 
_' 2 

= pl | 
— ky 

& = 7 2l 


The new multipliers by Hotelling’s rule are then 
to ts te th 


oti Ty Se ty 


and are therefore equal to 


ee 
Ll; l, l; ly 








T 

that is, they are the multipliers we began with. The coefficients of m 

the component, again after Hotelling’s rule, are di 

i, I, re 

: Lie Sie “ai . 

1¢75454+---28 si 

l; l 1 D 

so that the component thus taken out is identical with g. 

The above matrix of the l’s has all its tetrad-differences zero, 
including also those which involve an element in the principal diagonal. 
Its determinant is therefore zero, and this might well have involved 
failure in any attempt to arrive at the multipliers by an iterative 
process. I find however by trial that an iterative process actually 
works if we follow Hotelling’s method with the following modification: 
Start with units in the diagonal, and guess the first multipliers as 

in Hotelling. From the weighted totals ¢ of the columns form the It 

new multipliers, also as in Hotelling, by dividing by the largest of ar 

them, say ¢;. But before using the new multipliers, the diagonal cell co 


2 


contents are replaced by quantities sta) 


(7.e. the squares of Hotelling’s coefficients), and the whole process 
repeated again and again till it settles down to giving no change in 
the multipliers. 
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AN EXAMPLE, ON A PERFECT HIERARCHY 
As an example take the perfect hierarchy given below 








1.00 .56 .48 .40 | 1.0 
.56 1.00 .42 35 .8 guessed 
.48 .42 1.00 .30 .6 multipliers 
.40 .35 .30 1.00 4 
1.000 .560 .480 .400 
.448 .800 .336 # .280 
.288 .252 .600 + .180 
.160 .140 .120 = .400 
t 1.896 1.752 1.536 1.260 
m1. .92 81 .66 
t? 3.595 3.070 2.359 1.588 | = = 10.612 
(t;/Z).t? .64 .55 .42 .28 | check sum = ¢; 








The procedure with the guessed multipliers is shown above. 
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The 


matrix is now rewritten with the quantities .64, .55, .42, .28 in the 
diagonal, and the multipliers 1.00, .92, .81, .67 used on it. 
repetition of the process the new diagonal cells are substituted for the 
old. I find the following values for the diagonal cells at each succes- 


sive step: 
Diagonal cell contents 


.64 
.64 
. 64 
.64 
.640 
.640 
.640 
.640 
. 6400 


55 
52 
51 

. 50 
.495 
.493 
.492 
491 

. 4906 


.42 
.39 
37 

. 36 

. 360 
. 360 
. 360 
. 360 
. 3600 


.28 

. 26 
.25 
.25 

. 250 
. 250 
. 250 
.250 
. 2500 


After each 


It is clear that the results are settling down to .8? .7? .6? .5? which 


are the values agreeing with the Spearman equations from which the 
correlations were made, namely 


1 
Ze 
Z3 
Z4 


.89 + 


.681 


.7g + vV . 518 


.69 + 


. 883 


.59 + V .7584 


" oe 
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If we now premultiply the vector [.8 .7 .6 .5] by its transpose we 
obtain the matrix 





oe Se ee 
8} .64 .56 .48 .40 
.7 | .56 .49 .42 .35 
.6| .48 .42 .36 .30 
5 | .40 .35 .30 .25 


and subtracting this from the original matrix we have 


<r 
. oe 
. oo 
— 
which, treated by Hotelling’s unmodified process, gives the four 
specifics. 

Trials on other hierarchical matrices, including trials beginning 
with the most unlikely guesses for the first multipliers, have confirmed 
the process, which, it will be seen, takes out g by a modified calcula- 
tion, and then the n specifics by the unmodified Hotelling process. 


The results agree exactly with Spearman’s method when the hierarchy 
is strictly exact. 


APPLICATION TO NON-HIERARCHICAL MATRICES 


The process can be applied also to matrices of correlation 
coefficients which are not exactly hierarchical, but are sufficiently 
so. It does not then however give exactly the same results either 
as the average of all the quantities r.7;./7;; or as Spearman’s formula 
(A;? — A;')/(T — 2Ax) which is the fraction 2(rurjx)/Zri;. I do not 
know that either of these last two compromises, when the correlations 
do not satisfy the tetrad relations exactly, has any special theoretical 
justification, and the new, slightly different, compromise may possibly 
turn out to be best, or may show the way to the best value. 

If applied to matrices which definitely are not hierarchical the 
new process will give a component and supply its contributions to 
the variances. But such a component does not appear to have any 
helpful significance, any more than would g calculated in such a case 
by Spearman’s methods. If we take for example the variates: 


4 = .7g+ .5kK+ Vv .268; 
ta = .69 + .6k + V .288, 
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3 = .59 + .3k + V . 66s; 
Ye = .49 + vV .848, 


the various processes give first components whose contributions to 
the variances of the four variates are shown in the columns of the 


following table: 














Hotelling’s | New proc-| A? — A! Average of Equation | Equation 
"1 ess g’ T —2A = g+k | galone 
Li 77 .76 . 763 . 763 .74 .49 
Z2 .74 .67 .647 .628 .72 .36 
Zs .55 .34 .352 . 364 .34 .25 
xX .22 .10 .101 .102 .16 .16 




















It is true that in this case either Spearman’s or the new process 
approximates, in the contributions to the variances, to the joint 
contributions of the g and k of the equations. But as the same 
correlations could have been produced by other equations with 
different general and group factors this appears to be fortuitous. 


APPLICATION TO ONE CORRELATION ONLY 


It is peculiarly instructive to consider the simplest possible case 
with only two variates and one correlation, for example say 


Zi 
Zi 1.0 
7 


Z2 


Ze 
a 
1.0 


Here Hotelling’s process gives 


11 = V .8571 + V 1572 
Te = V .85y1 _ WV 1572 


that is, each variate is made up of the same components with the 
same coefficients, except for the sign which indicates that one or other 
of the components is an interference factor. 

Spearman’s own method is inapplicable here because it requires 
at least three variates to find the correlations with g by the formula 
(rurj./7s;)*. The application of the above modified process however 
gives an interesting result. If, as would be natural enough, we guess 
the multipliers to be equal, we proceed thus: 





we see eee eet oe 


——— 


ae oo 


¥ 
'. 
' 
b 
: 























i iT 
cae? tht 


The Journal of Educational Psychology 

















1 1 
7 1 1 
1.7 1.7 
t? 2.89 2.89 > = 5.78 
t:t?/= .85 .85 
.85 .70 1 
.70 .85 1 
t 1.55 1.55 
t,t?/> .775 .775 
.775 .700 1 
.700 .775 1 
t 1.475 1.475 
t,t?/= 7375 .7375 








and so on, the quantities¢,t?/ 2 reducing asymptotically to.7. Eventu- 
ally we shall arrive at the matrix 


Se 
et 
leaving the residues 
: oa 
.3 


to be explained by two specifics. This gives us therefore the equations 


N= V.794+ V 381 
= V.79+ V .38e 


If however we had not guessed equal multipliers at the outset, the 
matrix would not have settled down to the form 


a a 

5 ft 
but to the form 

Se 

*. Se 


where (ab)” = .7. This is merely in accord with the fact that the 


equations 
t= Vag+ V1 — as 
tz = Vbg + V/1 — bse, 
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the 
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where (ab)* = .7, will also give us our sole datum ri =.7. The 
Hotelling process explains the total variance and the correlation by 
a diagram like Fig. 1, where one or other of the portions is an inter- 
ference factor. The modified process corresponds to diagrams like 
Fig. 2 or like Fig. 3 where the correlation is imperfect, not because of 


an interference factor, but because of extraneous factors which are 
not common to the two variates. 


35 is \)t 














30 30) Fo.2 


Applied to a set of hierarchical correlations of which three can be 
found such that 














(rurn)/tg>1 


the modified process gives a coefficient of the first component which 
is greater than unity, and requires a specific later which has as coef- 
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ficient a multiple of ~/—1, that is, is imaginary; just as Spearman’s 
analysis does in like circumstances. 


SUMMARY 


A modification of Hotelling’s process is described which when 
applied to a hierarchical matrix of correlations gives Spearman’s g, 
after which the unmodified process gives the specifics; and the interpre- 
tation of the two processes is discussed. 
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OPTIMUM ORDERS FOR THE PRESENTATION OF 
PAIRS IN THE METHOD OF PAIRED COMPARISONS 


ROBERT T. ROSS 
Institute of Human Relations, Yale University 


In the use of the method of paired comparisons the question of the 
order of pairs in the stimulus series is always a matter of importance. 
It is desirable that the experimental series should (1) eliminate space 
and time errors, (2) avoid regular repetitions which might influence 
judgment, and (3) maintain the greatest possible spacing between 
pairs involving any given member of the stimulus group. Of the 
various methods previously suggested for determining this optimal 
order none is free from criticism. Fechner! originally suggested in 
1871 a purely chance order, and thus made the series liable to all of 
the above undesirable features with the possible exception of the 
second. 

Cohn’s? method of 1894, while it may be balanced to avoid space 
and time errors, not only fails to maintain the maximum spacing 
between identical members, but advocates a minimum spacing with a 
highly suggestive repetitive element; thus 2-1, 1-3, 3-2, 2-4, 4-3, 3-6, 
etc. In contrast with this Kiilpe’s* suggestion (1907), in which the 
first member is first compared with all the other members (1-2, 1-3, 
1-4, 1-5, . . . 1 — n), and then the second is compared with all the 
remaining ones (2-3, 2-4, 2-5, 2-6, . . . 2 — n), and so on, violates 
all of the rules for an adequate order. In 1904, Kowalewski‘ had 
established orders which, when balanced, comply with the require- 
ments set forth above; particularly that of maximal spacing. How- 
ever, Kowalewski empirically determined these orders for series 
involving five, seven, and fifteen members only, and states no general 
method. 

In the face of this general situation, Beebe-Center® wrote in 
1932, “‘Of these various modes of presentation, that advocated by 
Kowalewski is obviously the best. It is, however, very difficult to 
prepare except in the case of five, seven or fifteen stimuli for which 
one can use the series worked out by Kowalewski himself. In other 
cases it seems most expedient to arrange a sequence of pairs of stimuli 
in accordance with Cohn’s procedure, thus precluding any influence of 
the time-error or of the space-error, and then to arrange the sequence 
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by trial and error in such a fashion that no two successive pairs con- 
tain a common member.” 

In the present paper a method for the presentation of stimuli will 
be advocated which combines the “balance” feature of Cohn’s series 
with the maximal spacing of Kowalewski’s series, and which at the 
same time avoids suggestive repetitions and is applicable to experi- 
mental situations involving any number of members. 

Without going into the mathematical reasoning involved, the 
following are the characteristics of the orders developed: (1) The 
method presented is directly applicable only to series involving an 
odd number of members. If it is desired to find the optimal order 
for an even number of members, the optimal order for the odd num- 
ber next higher than the given even number should be determined. 
From this latter, odd-numbered order all of the pairs involving the 
extra member should be eliminated, and the resulting order will be 
the optimum one for the even-numbered order sought. Thus if it 
were desired to find the optimum order for a group of fourteen mem- 
bers, the optimum order for fifteen members would be determined and 
from this latter order all pairs involving 15 would be eliminated (7.e. 
1-15, 2-15, 3-15, etc.). The resulting series would give the optimum 
order for fourteen members. (2) Within any order for an odd number 


of elements, pairs involving the same member will be separated by a 


maximum of — 5 - pairs, and by a minimum of > 5 : pairs. Within 








any order for an even number of elements (m), pairs involving the 





same member will be separated by a maximum of = “ pairs, and 


by a minimum of = = ; pairs. (3) The series obtained from the 





formulae presented may be “balanced”; that is, so arranged that 
any given member appears an equal number of times as the first and 
second member of a pair. This is not quite true for the even-numbered 
series, for in this case there are (m — 1) pairs involving any given 
member, and since m is even, (m — 1) is odd; it is therefore impossible 
to balance the order perfectly, but it may be balanced within the 
limits of possibility. (4) If the orders are repeated in reverse arrange- 
ment directly after the experiment has been completed in the given 
order, the effects of fatigue may be balanced out. Also, such a plan 
insures perfect balancing for series of an even number of members. 
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PROCEDURE FOR OBTAINING ORDERS 


Tables I and II form a single unit for the determination of the 
optimum order for any number of members (n), where n is an odd 
number. 


















































































































































TaBLe I 
: Final 
First half series ony Second term | Third term | Fourth’term m= Den, 
term 
First entry...........0.. 1] 2 s eS la<g) 6 teott Ss as oe 
Second entry............. 1; 3 2 4 n 5 n-1l1 6 att tes 
Third entry.............. 1| 4 3 5 2 6 s 7 ass nts 
Fourth entry............. i| 5 4 6 3 7 2 8 at at 
oe n+1ln—1)n+3|n—-3\n+5\n—5in +7 
Final, 5 th, entry....}1 5 3 3 5 3 3 7) 2 n-1 
TaBiLe II 
bh es 
: First Second Third Fourt Fifth n 
Second half series Sm coren re ton rte > 
term 
First entry..........0.. 1*F5 2] 3 | n | 4 jn—a] 5 |n—al 6 nents + 
Second entry........... tS a} a4] 2] 5] a] 6 In—al 7 [attints i 43 
Third entry............ rast stetelietsivttets tent pid 
Fourth entry........... 1/2 Sreralivi¢gisistes ne tiete ‘ 
# <a} 
. n-—1l1 n + lin + 3in — lin + 5in — 3in + 7in — 5in + 9 got 
Final zg entry.......)1] 5 3 5 3 5 3 3 > 2 n fel 
2 


The Tables should be read from left to right beginning with the 
first entry of Table I and reading horizontally to its end, then go 
right on to the first entries of Table II reading from left to right; then 
read the second entries of Table I, then the second entries of Table II, 
and so forth to the end. The series must then be balanced. 


— |] 
Ezxample.—lf n = 15, then there will be +=) or seven entries 





in the tables, seven terms in Table I, and eight terms in Table II. 
First fill in all numbers to be calculated from the formulae given in 


- 
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the final terms and entries of both tables, then fill in the remaining 


blanks as indicated. 'The completed tables should appear as shown in 
Tables III and IV. 



























































TaBLeE III 
First half series First | Second | Third | Fourth | Fifth | Sixth | Seventh 
term term term term term | term term 
First entry........ 1;2/) 15 3 {14 | 4) 13 5 |12 | 6 |11 | 7 | 10 8 
Second entry...... 1/3] 2] 4115 | 5} 14] 618) 7/12} 8j11] 9 
Third entry....... 1\)4 3 5| 2/6) 15 7 |14 | 8 138 | 9 | 12} 10 
Fourth entry...... 1/5) 4] 6|;38{/7); 2] 8 /]15 | 9 |14 |10 | 13) 11 
Fifth entry........ 1 | 6 5 7|,4/8 3 9} 2 |10 |15 |11 | 14 12 
Sixth entry........ 11/7} 6| 8|5/;9)} 4] 10] 8 j11 | 2 112] 15 | 13 
Seventh entry..... 1/8 7 9 | 6 |10 5 | 11 | 4/12 | 3 |13 2/14 
TaB.Le IV 
__| First | Second |Third| Fourth | Fifth | Sixth |Seventh| Eighth 
Second half series term} term | term); term | term|term| term term 
First entry...... 1} 9 2) 3 /|15) 4 14 5 |13] 612) 7) 11) 8] 10] 9 
Second entry....| 1} 10; 3 | 4/2) 5 15) 6 |14 713) 8] 12) 9/11 | 10 
Third entry..... 1} 11) 4 5/3) 6 2) 7 115) 814 9 13) 10 | 12) 11 
Fourth entry....| 1} 12; 5; 6/)| 4 7| 3) 8 | 2} 915) 10) 14) 11 | 13 | 12 
Fifth entry...... 1} 13} 6| 7) 5 8 4 19 | 3] 10) 2 11) 15) 12] 14] 13 
Sixth entry...... 1| 14 7 8;| 6 9 5 10) 4 11] 3} 12) 2) 138 | 15} 14 
Seventh entry...| 1} 15) 8 | 9 | 7} 10) 6) 11 | 5) 12) 4) 13) 3) 14) 2/| 15 





















































The completed series will read as follows: 1-2, 15-3, 14-4, 13-5, 
12-6, 11-7, 10-8, 1-9, 2-3, 15-4, 14-5, 13-6, 12-7, 11-8, 10-9, 1-3, 
2-4, 15-5, 14-6, 13-7, 12-8, 11-9, 1-10, 3-4, 2-5, ete. The balanced 
series is given in the orders at the end of this article (Table VII). 

Another method, which in some respects is more easily accom- 
plished than the one presented above, requires the substitution of the 
necessary values in the following table and the reading of the optimum 


order downward in columns from left to right. 


In this table > - : rows will be used, andn — 1 columns. It may 





- _ - row, the fixed number on the chart 





easily be shown that at the 
n+1 n+3 








will be 5 +1l1= >? and the number on the variable scale will be 
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TaBLe V 
I II Ill IV Vv VI VII Etc. 
1) 2 2 3 1 3 3 4 1 4 4 5 1 5 | Ete. 
n 3 n ais 4] 2 5 | 3 5| 3] 6] 4 6 | Ete. 
n-1 4 |In-—-1| 5 n 5 n 6 2 6 2 7 3 7 | Ete. 
n—2 5 in-2} 6 nm—-1} 6 iIn—1| 7 n 7 n 8 2 8 | Ete. 
n-3 6 in-3| 7 nm—-2) 7 in—2)| 8 in—-1| 8 In—1) 9 n 9 | Ete. 
n-—-4 7 |jn-—-4 8 jn-3) 8 iIn-—3| 9 |In-—2}| 9 In—2| 10 In—1)| 10 | Ete. 
n-5 8 in-5| 9 n—-4) 9 In—4!) 10 in—3| 10 jn —3}] 11 In—2| 11 | Ete. 
n—-6 9 iIn-6) 10 In—5 10 jn—5| 11 In—4) 11 In—4) 12 In—3| 12 | Bte. 
n-7 10 |jn-—7| 11 In—6 11 |jn—6| 12 In— 5] 12 In—5| 13 In —4! 13 | Ete. 
n-8 1l in—8| 12 |In—7| 12 |jn-—7| 13 |n—6) 13 In —6] 14 In — 5) 14 | Ete. 
Etc. Ete. | Ete. | Ete. | Ete. | Ete. | Etc. | Etc. | Ete. | Ete. | Ete. | Ete. | Etc. | Etc. | Ete. 















































a oon 5 ~ == as 4 so that at this point the number on the fixed 
scale is identical with that on the variable scale. In reading this 
entry, therefore, the following change must be made in the chart: 
In the odd-numbered columns, (7.e. where repetition occurs) disregard 
the number in the variable column and replace it by one. In the even 
numbered columns, disregard the entry in this row altogether. The 
final order is read downward in successive columns from left to right. 

Ezample.—lf n = 15, then the number of entries is eight, and the 
number of columns, fourteen. The completed chart is shown in 
Table VI. 











TaBLe VI 

I |} m]}url] iv} vj Vvt| virj viar{ x x XI | XII | XIII} XIV 
1} 2/ 2] 3] 1| 3] 3] 4] 1] 4) 4) 5] 1] S| 5] 6} 1) 6 6 7] 1) 7} 7} 8} 1] 8} 8} 9 
15| 3|15| 4| 2] 4) 2| 5| 3] 5| 3] 6] 4) 6] 4) 7] 5) 7} 5} 8} 6} 8 6 9} 7| 9 7] 10 
14| 4/14] 5/15] 5|15| 6| 2] 6| 2| 7/ 3}. 7| 3] 8| 4) 8} 4} 9} 5} 9] 5 10) 6] 10) 6] 11 
13| 5|13| 6114! 6|14| 7/15] 7/15) 8| 2} 8] 2) 9} 3] 9} 3] 10) 4] 10) 4/11) 5] 11) 5) 12 
12) 6|12| 7/13] 7/13] 8/14] 8|14| 9/15} 9] 15] 10} 2] 10) 2/11) 3] 11) 3] 12) 4] 12) 4| 13 
11] 7/11} 8/12] 8]12} 9/13] 9]13/10|14! 10] 14] 11) 15] 11| 15| 12} 2} 12} 2] 13) 3] 13) 3) 14 
10} 8/10} 9/11) 9]11/10}12|10)12|11/13] 11) 13} 12) 14} 12) 14] 13) 15) 13) 15] 14) 2/ 14) 2) 15 
1] 9}. 1}10}.. 1/11).. 1} 12 1| 13 1| 14 1| 15 

























































































The order then reads as follows: 1-2, 15-3, 14-4, 13-5, 12-6, 11-7, 
10-8, 1-9, 2-3, 15-4, 14-5, 13-6, 12-7, 11-8, 10-9, 1-3, 2-4, 15-5, ete. 

It will be seen that this order is identical with that obtained by the 
first method, and that both are identical with that obtained empirically 
by Kowalewski. Devices for performing these calculations auto- 
matically are easily constructed. 
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Hl 
‘. TaBLE VII.—Ba.ancepD ORDERS FOR THE Opp NUMBERS FROM FivE TO SEVENTEEN 
fi, N = 65 8-6 8- 9 7-10 12- 1" 
(: 1-2 7-1 1- 4 8- 9 7- 6 
wi 5-3 4-3 3- 5 1- 3 8- 5 
aN 4-1 5-2 2- 6 2- 4 9- 4 
ay 3-2 6-9 1l- 7 13-5 10- 3 
yy 4-5 7-8 10- 8 12- 6 11- 2 
We 1-3 1-4 9-1 11-7 12-13 
at 2-4 3-5 5- 4 10- 8 1- 7 
By\i 5-1 2-6 6- 3 9-1 6- 8 
h 3-4 9-7 7-2 4-3 5- 9 
a 2-5 8-1 8-11 5- 2 4-10 
N=7 5-4 9-10 6-13 3-11 
1-2 6-3 1- 5 7-12 2-12 
7-3 7-2 4- 6 8-11 13- 1 
: ‘ 6-4 8-9 3- 7 9-10 7- 8 
: 5-1 1-5 2- 8 1- 4 6- 9 
| 3-2 4-6 11- 9 3- 5 5-10 
: 4-7 3-7 10-- 1 2= 6 4-11 
| 5-6 2-8 6- 5 13- 7 3-12 
ia 1-3 9-1 7-4 12- 8 2-13 
fy 2-4 5-6 8- 3 11- 9 
. 7-5 4-7 9 2 10- 1 N = 16 
4 6-1 3-8 10-11 5- 4 1- 2 
a 4-3 2-9 1- 6 6- 3 15- 3 
{i 5-2 5- 7 7-2 14- 4 
bi 6-7 N = 11 4-8 8-13 13- 5 
1-4 1- 2 3- 9 9-12 12- 6 
3-5 1l- 3 2--10 10-11 11-7 
2-6 10- 4 11- 1 1- 5 10- 8 
7-1 9 5 6 7 4- 6 9~ 1 
4-5 8- 6 5- 8 3- 7 3- 2 
3-6 q.1 4-9 2- 8 4-15 
2-7 3- 2 3-10 13- 9 5-14 
4-11 2-11 12-10 6-13 
N=9 5-10 li- i 7-12 
1-2 6- 9 N = 13 6- 5 8-11 
9-3 7- 8 . B 7- 4 9-10 
8-4 1- 3 13- 3 8- 3 1- 3 
7-5 2- 4 12- 4 9- 2 2- 4 
6-1 1l- 5 1l- 5 10-13 15- 5 
3-2 10- 6 10- 6 11-12 14- 6 
4-9 o- 7 - 7 1— 6 13— 7 
7 5-8 8- 1 8- 1 5- 7 12- 8 
6-7 4-3 3- 2 4-8 11- 9 
1-3 5- 2 4-13 3- 9 10- 1 
2-4 6-11 5-12 2-10 4-3 
9-5 7-10 6-11 13-11 5- 2 
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6-15 
7-14 
8-13 
9-12 
10-11 
1- 4 


3- 5 * 


2- 6 
15- 7 
14- 8 
13- 9 
12-10 
1l- 1 
5- 4 
6- 3 
7-2 
8-15 
9-14 
10-13 
11-12 
1- 5 
4- 6 
3-7 
2- 8 
15- 9 
14-10 
13-11 
12-1 
6- 5 
7- 4 
g- 3 
9- 2 
10-15 
11-14 
12-13 
1- 6 
5- 7 
4- 8 
3- 9 
2-10 
15-11 
14-12 
13- 1 
7- 6 
8 5 
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TaBLeE VII.—Continued 


o 4 10- 1 3-7 
10- 3 8-2 2-8 
11- 2 4-17 17- 9 
12-15 5-16 16-10 
13-14 6-15 15-11 

1-7 7-14 14-12 

6- 8 8-13 13- 1 

5- 9 9-12 6- 5 

4-10 10-11 7-4 

3-11 1- 3 g- 3 

2-12 2- 4 Q- 2 
15-13 17- 5 10-17 
14-1 16- 6 11-16 

8- 7 15- 7 12-15 

9 6 14- 8 13-14 
10- 5 13- 9 1- 6 
ll- 4 12-10 5- 7 
12- 3 11-1 4- 8 
13- 2 4-3 3- 9 
14-15 5- 2 2-10 

1- 8 6-17 17-11 

7-9 7-16 16-12 

6-10 8-15 15-13 

5-11 9-14 14- 1 

4-12 10-13 7- 6 

3-13 11-12 8- 5 
2-14 1- 4 Q- 4 
15-1 3- 5 10- 3 

8-9 2- 6 1i- 2 

7-10 17- 7 12-17 

6-11 16- 8 13-16 

5-12 15- 9 14-15 

4-13 14-10 1- 7 
3-14 13-11 6- 8 
2-15 12-1 5- 9 

5- 4 4-10 
N = 17 6- 3 3-11 

1- 2 7-2 2-12 
17- 3 8-17 17-13 
16- 4 9-16 16-14 
15- 5 10-15 15- 1 
14- 6 11-14 8-7 
13-7 12-13 - 6 
12- 8 1- 5 10- 5 
11- 9 4- 6 11- 4 


12- 3 
13- 2 
14-17 
15-16 
i- 8 
7-9 
6-10 
5-11 
4-12 
3-13 
2-14 
17-15 
16- 1 
- 8 
10- 7 
ll- 6 
12- 5 
13- 4 
14- 3 
15- 2 
16- 1 
9 8 
10- 7 
ll- 6 
12- 5 
13- 4 
14-3 
15- 2 
16-17 
1- 9 
8-10 
7-11 
6-12 
5-13 
4-14 
3-15 
2-16 
17- 1 
9-10 
8-11 
7-12 
6-13 
5-14 
4-15 
3-16 
2-17 
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SUMMARY 


The various orders for the presentation of pairs in the method of 
paired comparisons are reviewed. It is shown that orders of the form 
worked out by Kowalewski for five, seven and fifteen members, if 
balanced to avoid time and space errors, are superior to any others. 
A general method for finding orders for any number of members is 
developed and the characteristics of the orders discussed. Methods 
are suggested for performing the calculation of the orders. A table 
of the balanced optimal orders for odd numbers of members from five 
to seventeen is given. The optimum, balanced orders have the 
following advantages: (1) They maintain the greatest possible spacing 
between pairs involving identical members, (2) they are so balanced 
as to remove time and space errors, (3) they avoid regular repetitions 
which might have suggestion effects, (4) by repeating the series in 
reverse order fatigue effects may be balanced out, (5) from these 
orders for odd-numbers of members, the optimum even-numbered 
orders may be obtained by a simple rule. 
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Nostrand Co., 1932, p. 32. 
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THE FACTOR THEORY AND ITS TROUBLES: 
CONCLUSION. SCIENTIFIC VALUE 


C. SPEARMAN 


1. PASSIVE RESISTANCE 


In the first three articles of this series we considered the mistake 
of supposing that the theory of Two Factors had ever failed to be 
corroborated by actual observation. This was a mistake mainly due 
to this theory having been greatly misrepresented. In the next 
article we allayed the suspicion which had arisen that g might not be 
unique. In the following or fifth article we exposed the error under- 
lying the attempt to impeach the theory on the grounds of logically 
inadequate methods of proof. 

In the present and concluding article we have to deal with a yet 
more fundamental objection; namely that these said Factors, even 
when perfectly established—empirically corroborated, uniquely deter- 
mined and rigorously proved—are still nullified by having little or no 
scientific value. 

By far the commonest and most effective way in which this last 
objection has been brought into play is by what may be termed 
“passive resistance.”” Why introduce all the paraphernalia of theory, 
some psychologists seem to ask themselves, when we are getting along 
quite well enough without them? 

Owing to such tacit indifferentism, many of the most fundamental 
results brought forth by the theory—such as, for instance, the correla- 
tion of g with the power to perceive relations and to invent correlates— 
have been ignored. Other results have indeed been appropriated, but 
in a concealed and vitiated manner. Against this voiceless, perverted, 
but infinitely tenacious opposition there appears to be little or no 
direct aid. We are reduced to the indirect approach by way of 
diligently considering those critics who do venture to speak openly. 


2. REPROACH OF BEING OBSCURE 


Now such openness of speech arrives in its lowest degree when, 
instead of the theory being left altogether unmentioned, it is summarily 
reproached with such epithets as “obscure,” “intangible” or—final 
term of opprobrium !—“ philosophical.”’ 

To throw some light on this point, let us compare the Two Factor 
theory with its principal rival, the customary measurement of “‘intel- 
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ligence,’”’ as presented in the ‘‘ Mental Age” of Binet and his followers; 
or, relatively to age, in the “Intelligence Quotient” of Stern. 

We will first take up the matter from the purely quantitative 
viewpoint. In the case of the Two Factor theory, the fundamental 
equation runs as follows: 


9 = Tpotp + .67(1 — 1?,,)! (A) 


where the terms on the right are derived from test scores and employed 
in accordance with the theory. 
In the case of the Mental Age, on the other hand, we have 


M.A. = S(é) (B) 


where the term on the right is simply the sum (or average) of the test 
scores. 

Now, when we proceed to compare (A) with (B) we note first 
and foremost that in the former the g is wnique (as shown in Article IV 
of this series). That is to say, whatever tests are used by any investi- 
gator, the magnitude aimed at—and save for the experimental error, 
actually obtained—will remain constant. This is the first necessity 
of science. How should, for instance, any rational study be made 
of the influence of heredity upon any magnitude if it is a substantially 
changing one? 

But this is just what we get when we turn from (A) to (B). In 
the latter, the M.A. fails to be unique; for it will vary to an indefinite 
extent, according to the tests that happen to have been used. Such 
tests are not selected on any definite principle. Nor does their 
procedure of summing or averaging possess any rational basis; it has 
only arisen by way of an unacknowledged and incomplete borrowing 
from the theory of Two Factors. As Brigham caustically remarked: 


the theory (of Two Factors) seems almost universally rejected in theory and 
accepted in practice. The Two-Factor theory is implicitly accepted when differ- 
ent tests are collected in a battery and added together in a single score.? 


The second great difference of (A) from (B) is that the former dis- 
plays its error, whereby this can be remedied. Whereas (B) supplies 
no such warning, and so allows the error to become terrifically large 
(see Article IV); the psychologist using it lives in a fool’s paradise. 





1See Holzinger, K.: Statistical Resumée of the Spearman Two-Factor Theory. 
Univ. Chicago Press, 1931. 
2 Study of Error, 1932, p. 23. 
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Let us turn from the quantitative to the qualitative aspect of the 
matter. It has been proved that g measures ability to perceive 
relations and to invent correlates. On the other hand, g has been 
shown not to measure the efficiency of sensory organs or of motor 
organs, or of retentiveness.! Nothing of all this has anything obscure 
about it; the qualities are precisely defined and exhaustively exempli- 
fied. And their precision is all to the good even for the most extreme 
behaviorist. For besides defining the mental processes involved, they 
perform the same office for the physical stimulus and response. 

On turning to the Mental Age, here again we find not greater 
clarity, but far less. Its proponents rely everywhere upon the word 
intelligence. But they fail to set forth what they intend this word to 
mean. And this failure on their part is not merely a lack of exactitude; 
it is an absence of even the broadest outline. They leave unsettled 
whether their “intelligence” is to include any such pre-eminent 
regions of knowledge as memory, imagination, and sensory perception. 
Either they supply no definitions to aid us in settling this point; or 
if they have settled it, they take no heed of their settlement in actual 
practice. Thus Binet, for example, formally declared that memory 
is not “‘intelligence”’ but only “‘the great simulator of this.’”’? Yet he 
himself in his actual measurement of Mental Age introduced tests of 
memory over and over again. And the same may be said of the great 
army of his successors. | 

Not only do the advocates of “‘intelligence’’ thus fail to say what 
this is intended to include; they are no less vague as to what is not 
included in it. Their attitude nowadays, is typified in the following 
experience. The present author mentioned to one of the most 
eminent authorities of the present day that, as just said, g does not 
measure sensory, motor, or retentive abilities. Surprisingly, he 
replied that this much “had always been known.’’ Surely he and 
others like him must have very short memories. 

Binet himself wrote as follows: 


@ sensation, a perception are intellectual manifestations as much as reasoning is. 


See, too, Haggarty in the outstanding Symposium published in this 
Journal as late as 1931. He writes: 
intelligence is a practical concept connoting a group of complex processes tradi- 


tionally defined in systematic psychologies as sensation, perception, association, 
memory, imagination, discrimination, judgment and reasoning. 
LL 


' Abilities of Man, Chap. XI-XIII, 1927. 
* Annee Psyc., XI, 1905, pp. 195-197. 
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Or take the remainder of the definitions of intelligence given in this 
same Symposium. Several of these explicitly include ‘‘sensory” 
capacities, ‘‘motor responses,” ‘‘memory.” Almost all of them at 
least imply such abilities. And even in the rare cases where, as occurs 
with Terman, such abilities are not perhaps implied in “intelligence” 
as formally defined, they still are introduced into it as actually tested. 

We seem thus forced to admit that, even so far as concerns quali- 
tative description, the theory of two factors is not more, but far 
less, obscure than that which is commonly put forward in connection 
with the word “‘intelligence.”’ 


3. REPROACH OF BEING HYPOTHETICAL 


Akin to the preceding charge of being obscure is that of being 
hypothetical. Now, primarily at any rate, g is not hypothetical at 
all; it is only a composite or function of the directly observed test 
scores. But almost inevitably the psychologist does feel impelled 
to go beyond this restricted field. He can scarcely resist the tempta- 
tion to ‘‘interpret”’ the g, in the sense of attributing to it further 
and deeper significance. Thus, for instance, g has often been taken 
to represent some psycho-physical energy, or some efficiency of the 
genes. Into such interpretations there will almost inevitably be 
introduced much that is hypothetical And here, let us freely con- 
cede, even some obscureness is more than likely to make its appearance. 

On the other hand, we must remember that physical science itself 
lies in precisely the same predicament. Its chief concepts (force, 
gravity, temperature, refraction, pressure, density, electricity, magne- 
tism, and so forth) are essentially hypothetical. And the very same 
excuse that serves physics must do also for psychology. It is that 
hypotheses—when introduced with moderation and reservation—are 
extremely potent aid to thought. 

Still the charge may be brought that the interpretation of g 
not only involves hypothesis, but does so after a peculiar fashion. 
Whereas all previous investigations had tried but failed to start from 
a concept of ability and then measure it, the theory of Two Factors 
first (under certain conditions) found a measurement, and then 
proceeded to fit to it the appropriate ability. 

But this Ptolemaic reversal of procedure, although so new to 
psychology, is old and warranted enough in physical science. Take 
as example electricity. This from the very first was not defined 
logically, but only indicated by certain conditions. It was taken 
to be that something—whatever this might eventually prove to be— 
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which caused rubbed amber to attract small feathers or bits of paper. 
But as investigators went on, this something was invested with 
further and further properties, which were more precise in themselves 
and more luminously supported by hypothesis. Such a progressive 
investment with significance would appear to be now attending the 
study of g. The time for the objection that its measurement has “‘no 
use until we know its real nature” is now long past. 


4. ATTRIBUTION TO HETEROGENEITY 


Throughout this article hitherto we have been considering the vari- 
ous attempts to impeach the Two Factor theory of being ‘“‘obscure,”’ 
“intangible,” or “‘ philosophical.” In the rest of this article we shall 
turn to the attacks that have been made along a different line. Here 
the theory is no longer accused of any defect in its own intrinsic 
nature. Instead, it is charged with not leading to scientifically fruitful 
consequences. 

One surprising utterance of this tendency has been that of Thom- 


son, when he seems to wish to dispose of the whole matter summarily 
as follows: 


On no theory at all (sic), by the laws of probability, there will be a tendency 
towards zero tetrads differences among correlation co-efficients.' 


Small wonder that his former co-author (William Brown) and his 
former student (William Stephenson) join to say of this attempt: 


nothing can be more naive, nothing more mistaken.’ 


Surely there is some unfortunate misunderstanding somewhere. 
At the very time that Thomson writes in this way, another author, J. H. 
Wilson, whilst repeatedly expressing his indebtedness to Thomson, 
explicitly accepts g and makes an extensive use of its determination.* 

Serious enough, on the other hand, would appear to be the objec- 
tion that has been raised against the general factor g on the ground 
that it might disappear if only the group of persons investigated could 
be freed from heterogeneity. Thus, Truman Kelley, whose words must 
always command if not acceptance at least respect, writes as follows: 


if the subjects tested are dissimilar with respect to maturity, race, sex, and general 
nurture, (i.e. nurture not affecting a single trait), a common factor due to this 
heterogeneity will appear ... This . . . leads us to wonder whether had we 
experimentally allowed for maturity, race, sex, and general nurture any general 





' British Journal Psychology, XXIII, 1933, p. 404. 
* British Journal Psychology, XXIV, p. 211. 
* British Journal Psychology, p. 71ff. 
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factor would have remained. In other words, we may wonder, if there is any 
factor at all independent of these things corresponding to Spearman’s idea of a 
‘central fund of intellective energy” or ‘‘general ability,” or ‘‘g.’’! 


This and further passages—if not in his own writings, at any rate, 
in those of his followers—would seem to have implied that g is little 
more than a statistical artifact. 

Unfortunately, this offence of neglecting to take into account 
such factors as age, etc., he specifically lays to the charge of the 
present writer. He writes: 

Spearman’s groups typically have not been children of the same age. And he 


has not resorted to a partial correlation technique to reduce his data to a constant 
age basis. (See his Crossroads, page 17.) 


In truth, far from neglecting the “‘ partial correlation technique,”’ 
I had the good fortune, I believe, of being the first to introduce it into 
psychology. Certainly, my own earliest application of it consisted 
in reducing the data to a “‘constant basis” in respect of these very 
influences, age, sex, and so forth. And I venture to claim that no 
other writer has ever since studied these influences more laboriously 
or laid on them greater emphasis.? By an irony of chance, Kelley, in 
this very book, so strongly asserting the need of reduction to a constant 
basis in these respects, and so strangely reproaching me with not 
making it, appears not to have made it himself. 

Passing, however, from the historical to the purely scientific 
aspect of the matter, we find Kelley’s words suggesting that my idea 
of a “‘central fund of intellective energy, or general ability, or g”’ is 
the idea of something independent of such influences as maturity, 
race, sex, and soforth. But in truth, any idea of this kind would seem 
to me the height of absurdity. Of course, g may be dependent on any 
or all of these or other influences. The theory does not attribute 
the magnitude of g to a miracle! Its very problem is to ascertain 
which of such influences are effective and in what degrees. And in 
solving it is not guided by assumption, but by investigation. 


5. PRE-ARRANGEMENT OF RESULTS 


Still more drastic than the preceding attempt to represent g as an 
artifact of statistics is the more recent tendency to suggest that it is 
an artifact of experimentation. But whereas the former reproach 
emanated principally from one single high authority, the latter 


1 Crossroads in the Mind of Man, pp. 109-110. 
2 American Journal of Psychology, 1904, pp. 95-99, 223, 226-227, 260-279, and 
in countless other places subsequently. 
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suggestion seems to have mostly come from numerous authors, who, 
on this topic at any rate, are decidedly amateurish. The criterion of 
hierarchy (or zero tetrads) is satisfied by the correlations—so some- 
times runs the charge—merely because experimenters have picked 
out for use just those tests whose correlations happen to satisfy it. 
But against this suggestion there stands the recently demonstrated fact 
that such procurement of hierarchy by means of selection is practically 
impossible (see article IV of this series). Still more convincing is the 
fact that the satisfaction of the criterion has occurred nearly as often 
in the correlations published by the opponents of the theory. Fur- 
thermore, as abundantly shown in the preceding articles (especially 
III) the theory of Two Factors does not require that the satisfaction 
of the criterion should be universal, or even wide-spread, but only 
that it should occur sufficiently to admit of g being calculated. The 
theory only demands in fact that somewhere or other within the wide 
sphere of cognitive ability the criterion should find a home. And 
such a home could—if otherwise desirable—be found in the very 
correlations produced by the opponents of the theory. 

Other objectors have urged instead that the theory only takes out 
of the tests those abilities which it has first put into them. Of this 
tenour is the following recent statement: 

Dr. Spearman begins with items which he thinks may be intellectual and then 


comes out with results that are directly dependent upon his original selection 
of them.? 


This is the reverse of the truth. We actually began with items as 
representative as possible of what anybody had ever thought to be 
“‘intellectual.”” To many of these items we ourselves should not have 
dreamt of giving this title of ‘‘intellectual.’”’ But we could afford to 
let all items pass alike. For the procedure prescribed by the theory 
does not lead to results dependent on their original selection. 

On a par is the following line of argument, which scarcely ventures 
into print, but to circulate freely enough in lobbies: 

The tests have to be carefully selected, it is said, with the aid of preliminary 
experimentation. If any of the tests are then found to be throwing out the 


hierarchy, it is assumed that this is due to overlap even if direct inspection failed 
to reveal such overlap. 


Nothing, it must be replied, could be more groundless. The pro- 
ponents of the theory make no “assumption” whatever about the 





1 British Journal of Psychology V, 1912, p. 60. 
* Wellman: Conference on Research in Child Development, Chicago, 1933. 
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nature of the additional bond between any two test scores that is 
found to exist over and above the correlations due to the general 
factor. Instead, they always proceed to settle this point on the basis 
of the entire available evidence. 


6. WISDOM AFTER THE EVENT 


Whereas the preceding objectors brought the accusation that the 
experimenter arranged beforehand what his results were going to be, 
other critics have complained that the experimenter failed to know 
his results beforehand. If this be so, it was said, then the theory 
comes too late; the experimenter is only wise after the event. 

Now, taken in this generality, this further and opposite complaint 
would appear to be hardly worth considering. Generalizations 
reached from the theory of Two Factors have been verified over and 
over again. An example we have already met in the avowal of Kelley, 
that his results “‘were quite remarkably in harmony” with those 
reached in the theory long before. As another instance take the 
recent finding of Anastasi. 

The chief conclusion that can be drawn from the present study is, clearly, that 
we cannot speak of a single common factor running through all forms of memory. 
This is very nearly what has been found years before by the Two 
Factor theory, as shown in the following passage: 
as forthe . . . tendency to retain dispositions, this has shown itself not to possess 
any such (universal) functional unity (although commonly assumed to do so). 
Or let us look more generally at the whole list of “broad” factors 
announced in the Abilities of Man; they were verbal, memorial, 
arithmetical, geometrical, imaginative, mechanical and psychological. 
All subsequent investigations—though certainly conducted in no 
friendly mood—have brought confirmation to most of these older 
results; have produced contradiction to none of them; and have 
even made no fundamental additions to them. 

But although it is thus absurd to charge the Two Factor theory 
with not having conferred any power of prediction, yet in many fields, 
no doubt, this power is still very limited. Especially conspicuous 
here is the case of “overlap.” The occurrences of this, although 
indeed always asserted and even emphasized by the theory, has not 
yet been rendered predictable by definite formula. It has been traced 
in large measure to excessive similarity; but no precise rule has been 
furnished as to how much likeness really is ‘‘excessive.”’ 

Here again, however, the situation is in no worse case than that 
even of the natural sciences. Compare the preceding limitation laid 
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down about the mental law of overlap with that of the physical law 
which connects the length of a pipe with the note it produces. Just 
as overlap is only avoided when the dissimilarity is not ‘‘too small,” 
so too the asserted length of the pipe only holds so long as this is not 
“too short.”” Or again take Hooke’s law that the elastic modulus- 
stress divided by strain. This law only holds within the limits of 
perfect elasticity; and the elasticity is only perfect so long as the 
distorting forces are not “‘too great.’’ Here science is left with the 
further task of determining in respect of every different substance 
what are the limits at which the distorting forces are not “‘too great.’’ 
The more substances are measured in this way, the more complete of 
course becomes the science of elasticity. 

And such a path is necessarily being followed also by psychology 
in respect of overlap. The more exhaustively this is studied, the 
better becomes the power of anticipating its occurrence. And 
although, as said, this power has not yet attained the perfection that 
would be bestowed by an exact formula, still after its cruder fashion 
it is already in the majority of cases quite effective. 


7. EPILOGUE TO THE WHOLE SERIES 


So ends our story of the chief troubles which have befallen the 
theory of ‘‘Two Factors.”’ A few of these have consisted in criticisms 
that were fair, shrewd, and even profound. Such have resulted in 
placing this theory upon an increasingly solid foundation. But as 
for the rest, many a critic seems to have been either too anxious to 
avoid admitting that he had been in the wrong, or else too hopeful 
that the destruction of the work of others will ‘“‘make their glory his.” 
Such a writer has often pushed and maintained his opposition to the 
extent of futile vexatiousness. The sad result of all such excessive 
opposition has been that numberless psychologists are still working 
unsupported by any theoretical basis. They are still complacently 
going on with what may, without great exaggeration, be called their 
mass production of disastrously erroneous measurements of a pre- 
posterously equivocal “‘intelligence.” If but a tithe of all the efforts 
devoted to futile criticism had instead been expended on positive 
investigation, what a very different picture we might have of science 
by this time! 

Even as things have been, however, the last quarter of a century 
seems to show a steadier progress here than in any other region of psy- 
chology. Before long, it is hoped, the present account of obstruction 
that failed will be followed up by one of construcion that succeeded. 
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MAKING USE OF A GENERAL OBJECTIVE IN 
EDUCATIONAL RESEARCH 


W. A. SAUCIER 
John Fletcher College 


The confusion prevalent in education has led some to advocate 
an educational philosophy ‘‘comprehensive in its outlook” and 
“consistent in its several departments.”! Such a broad view of 
education should enable us to determine our values with reference 
“to the whole of our beings—all our tastes, desires, and capacities’”’ 
and ‘‘to the whole of our lives in point of time” rather than to merely 
a particular part.2 This comprehensive conception of education 
implies belief in a general objective to unify and harmonize educational 
effort. The use of such a broad, vital ideal, or guiding principle, in 
educational research, is a fundamental need of education today. 

As Dewey has well stated, among those who accept the principle 
of general objectives there is close agreement concerning the nature 
of these objectives. Essentially, there is common stress on the 
continuous development of the individual’s traits, for life in a society 
of confusion and change. As suggested by one writer: ‘“‘Our civiliza- 
tion demands increasingly the ability to cooperate with others and 
the ability to imagine the needs and desires of those” outside of one’s 
immediate group.* More in detail: ‘‘Today we are concerned not 
only with the academic child—but with the whole child, which will 
include his attitudes and ideals, his likes and dislikes, his fears and 
worries, his conflicts and inhibitions, his unified and integrated outlook 
on life and the many little habits and skills of social adaptation.’’* 
As presented by another: ‘‘The function of the school is the guidance 
of children in the development of habits and attitudes leading to the 
socially adjusted personality.”*® In other words, the two functions 





1 Counts, G. S.: “Philosophy and Research.” School and Society, Vol. XXX, 
July 27, 1929, pp. 103-107. 

2 Adams, J. T.: “Standards.” Forum, Vol. LXXX, October, 1931, pp. 232-239. 

’Curry, W. B.: “An Educational Tragi-Comedy.” The New Republic, 
Vol. LXVIII, August 19, 1931, pp. 17-19. 

4 Symonds, Percival M.: ‘‘The Contribution of Research to the Mental Hygiene 
Program for Schools.” School and Society, Vol. XXXIV, July 11, 1931, pp. 39-49. 

’ Williams, Eula S.: “A Personality Rating Form for Elementary School 
Pupils.” The Elementary School Journal, Vol. XXXIV, September, 1933, pp. 16- 
29. 


392 


Oo @dert © 6. A 


a>er Oo ot 





A General Objective in Research 393 


of the school are ‘‘in the training of the thinking powers” and in the 
development of “‘social intelligence.”! This means that a general 
objective should ‘“‘emphasize the need for freeing intelligence.”? And 
this point of emphasis is in keeping with the position of Bode that 
education should take as “‘its chief point of departure the problem 
of the liberation of intelligence.’’* 

Obviously, certain points of emphasis are common among all of 
these expressions of an ideal, directive aim, or general objective. One 
characteristic feature is the demand for development of personality 
traits in each individual rather than for success in the acquisition of 
facts and mechanical skills. Then there is the strong implication 
that the end involves action, performance, doing, instead of mere 
possession of information. Again, there is obvious recognition of the 
principle of individual differences in the attainment of the objective, 
in opposition to belief in inflexible standards, set up by either tradition 
or research. Further, concern for the individual includes concern 
for the personality of the child no less than of the adolescent or the 
adult. Finally, these expressions of a general objective are not 
based on prejudices for particular school subjects or for life in a static 
society but on the demands of the present complex and changing 
social situation. 

Thus we can see that the common reference of educators to devel- 
opment of the individual’s attitudes and intelligence, for social 
adjustment, implies belief in an individualistic-social aim. This is 
fundamentally a belief in the development of personality traits for 
social reconstruction. 

Such a view coincides with Dewey’s conception of the democratic 
ideal as the directive aim of education.‘ This ideal has been sug- 
gested by him in his presentation of the two criteria for judging a 
democratic society. A society is democratic as there are numerous 
shared common interests among individuals and free interplay between 
classes, or groups. For example, bandits are undemocratic in that 
interests within the group extend little further than the one interest 
of plunder and there is a lack of free interaction with other groups. 





1 Foster, F. M.: ‘Laissez-Faire and Education.” School and Society, Vol. 
XXXVII, April 22, 1933, pp. 524-526. 


2 Pyrtle, Ruth: “‘ Vital Values in Education.” School and Society, Vol. XXXII, 
July 12, 1980, pp. 41-44. 


* Bode, Boyd H.: Conflicting Psychologies of Learning, p. 297. 
*See Dewey, John: Democracy and Education, pp. 94-102. 
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Likewise America is more democratic than India, for we enjoy a ‘ 
wider range of mutual interests and have nothing comparable to the t 
| caste system. In the democratic society shared common interests 
and social intercourse between groups contribute to fostering a humane ‘ 
hal spirit and ability to think, the essentials of a democratic person. ‘ 
‘ Thus these criteria of the democratic society furnish an ideal as an e 
ut | objective and guiding principle for education. Such is the ideal ti 
i | democratic society by means of which and for life in which we should 7 
‘aul educate our boys and girls. , bh 
tH I It is obvious that considerable consensus of opinion exists concern- ti 
I) 4 ing the nature of a broad directive aim for education. What is p 
k i chiefly needed today is not a new presentation of an ideal, or general v 
I: ie objective, but consistent and persistent application, in research, of . 
i : our present widely accepted objective. Consideration of a few 
ee instances should reveal this fact. ck 
i ; Proper regard for the individuality of each child makes one doubt to 
he the possibility and advisability of using research to determine suitable - 
balk subject-matter. Educational scientists have revealed the nature and th 
all the extent of individual differences and have expressed concern about ch 
‘ } providing for these differences. Yet some leaders in research propose > 
A to know definitely the bits of subject matter of most worth for chil- de 
har dren in arithmetic, history, science, and other subjects by analyzing of 
me society to discover the use adults make of these subjects.! They su 
Ae would construct a “‘blue print of a perfect citizen” and ‘“‘blue prints 
i i of social efficiency,” consisting of thousands of items that should go on 
ae into the scientifically constructed curriculum. Such a procedure is In 
: ‘ considered as the “starting point in making educational programs.’”? in 
ie Then they would determine through statistical procedure when the fas 
att “average” child of each grade can most efficiently learn the particles eo) 
ie of subject matter thus prescribed. pr: 
: A lack of consistency is evident. Concern for standardized subject th 
4 } matter conflicts with proper regard for individual development, for — 
Le social reconstruction. Some researchers apparently have failed to 
te recognize what psychologists have demonstrated, that there is in int 
Hl reality no average person but that ‘‘every person is unique” and Jan 
of | 
hh sl one 674 
ait 1 Strayer, G. D.: ‘‘The Scientific Approach to Problems of Administration.” tior 
fi! School and Society, Vol. XXIV, Dec. 4, 1926, pp. 685-695. 192 
i ‘ 2 Peters, C. C.: Foundations of Educational Sociology. (Revised Edition), by - 
{ Chapters V, VI. Assi 
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‘possesses individuality.”' It seems, too, that they do not consider 
that each child’s capacity, interest, and need varies from day to day 
according to his experiences. To be specific, a pupil in the third grade 
is ready to learn to spell the words hen, chicken, eggs, farm, feed, 
corn, and the like, only as these words have become a part of his 
experiences. This means that the blue print is as valueless for the 
teacher as for the physician. Thus it is evident that suitable subject 
matter is determined best for the individual not by scientific technique 
but by the keen observation of an intelligent teacher. A full realiza- 
tion of this should prevent research from taking two conflicting 
positions: On the one hand stressing consideration of the extent and 
variableness of individual differences and on the other hand advocating 
a blue-print type of curriculum for the individual. 

In like manner, full appreciation of the principle of education for a 
changing society causes one to question the dependence on research 
to determine subject matter for future needs. To appreciate this, one 
needs to note the differences between life only five years ago with 
that of today. Moreover, as we have shown, adjusting oneself to 
change, especially to contribution to change, requires proper attitudes, 
a democratic philosophy of life, and ability to think. These are 
developed best when the increasing interests and enlarging capacities 
of growing boys and girls direct the work of the classroom rather than 
subject-matter determined in advance by either science or tradition. 

Particularly noticeable has been the failure of the researcher to 
consider general objectives in studies of the best method of teaching.? 
In making comparisons of methods of teaching, the researcher seem- 
ingly has adopted the traditional immediate aim of memorization of 
facts as the chief objective of education. This is evident from the 
common failure to mention any general objective and from the 
practice of testing solely for facts, by means of the objective test, at 
the end of the experiment. Further, the experimenter usually has 


‘Gates, A. I.: Elementary Psychology. (Revised Edition), p. 549. 

* For illustrations see: Marston, R. B.: ‘‘Individualized vs. Group Instruction 
in the Sisterville, West Virginia, High School.’’ School and Society, Vol. XX XIII, 
Jan. 10, 1931, pp. 59-60; Scheidmann, Norma V.: ‘‘Comparison of Two Methods 
of College Instruction.” School and Society, Vol. XXV, June 4, 1927, pp. 672- 
674; Douglas, Harl: ‘Controlled Experimentation in Methods in College Instruc- 
tion at the University of Oregon.” School and Society, Vol. XXVII, June 2, 
1928, pp. 663-665; Funk, M. N.: “‘A Comparative Study of the Results Obtained 
by the Method of Mastery Technique and the Method of Daily Recitation and 
Assignment.”” School Review, Vol. XXXVI, May, 1928, pp. 338-345. 
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followed scientific technique only partially, for often he has not 
controlled such factors as the teacher and the interests or attitudes of 
students. Yet in some instances he has drawn definite conclusions 
that one method is slightly superior or that there is no appreciable 
difference between the two. Sometimes, however, such conclusions 
have been modified by the statement that further experimentation is 
necessary before the positions of the teaching techniques can be 
established. 

Indeed we agree with researchers that further studies of methods 
of teaching are needed, studies that evaluate a teaching technique in 
accordance with its degree of contribution to the attainment of an 
acceptable general objective. No method can be truly judged as 
superior unless it results in improvement of social attitudes, ideals, 
appreciations, habits of thinking, and the like. Furthermore, the 
usefulness of any method varies from time to time in the flexible 
education program required for such outcomes. To develop growing 
boys and girls emotionally and intellectually, for social adjustment 
in a confused world, demands constant shifting of procedures. The 
value of a certain teaching procedure at any time depends on the whole 
classroom situation at that time. Accordingly, it rests on the teacher 
to choose the appropriate method. Incidentally this requires that 
the teacher will possess a humanistic spirit and ability and freedom 
to think, which is a challenge to teachers colleges and school adminis- 
trators. Thus we see the futility of research attempting to establish 
the degree of usefulness of any particular technique of teaching, 
especially without aid from a directive aim. 

Perhaps nowhere in education has there been more need for 
guidance from a general objective than in the field of achievement 
tests. Enthusiasts for the new-type examination stress the fact 
that this test measures achievement more efficiently than the discussion 
examination does, but rarely do they state explicitly the nature of the 
achievement they have in mind. In so far as researchers in this 
field give evidence of following any objective, they appear to have 
accepted the traditional, immediate aims of the acquisition of facts 
and mechanical skills. This inference is justified on the ground 
that the new-type examination measures these results rather than the 
major outcomes of education, self expression, organization of ideas, 
comprehension, judgment, sensing relationships, and the like. Seem- 
ingly, therefore, the educational scientist in his promotion of this 
examination has accepted education as it is with little or no concern 
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for the promotion of progressive ends.! With secondary concern 
about what he has been measuring, he has concentrated on accuracy 
of measurement and how to measure, or the technique of testing.” 

But we miss the main point at issue in stressing the technique of 
measuring apart from and to the neglect of what is measured. To 
measure objectively relatively unimportant outcomes is of little 
consequence. This is shown in the comparison the researcher com- 
monly makes between the objective test and the discussion examina- 
tion. Ordinarily promoters of the new-type examination contrast 
at length its large sampling, objectivity, reliability, validity, etc., with 
the small sampling, subjectivity, unreliability, and lack of validity 
of the discussion examination. But the objective test, evaluated in 
connection with a suitable general objective, is found lacking in 
validity. As pointed out by Odell, “‘to test knowledge of isolated 
facts and details” new-type tests possess superior validity, but to 
measure ‘‘originality, initiative, power to organize, to interpret, 
to analyze and synthesize, and various other reasoning processes to 
some extent,”* the discussion examination is higher in validity. 
Appreciation of this fact should prevent some educational scientists 
from making sweeping claims concerning the validity of the new type 
examination. 

The mania for an objective, scientific instrument of measurement 
has led to making this measuring instrument essentially the end of 
education. States and counties use objective tests for judging the 
progress of schools and to determine superior pupils in “‘scholarship” 
contests. The teacher consciously or unconsciously directs her teach- 
ing definitely toward preparation for these examinations. Indeed 
any kind of examination invariably affects the method of teaching. 
This is as it should be, for there should always be harmony between 





‘As evidence, see the ordinary book on educational tests. Also consider 
Eurich, Alvin C.: ‘Four Types of Examinations Compared and Evaluated.” 
The Journal of Educational Psychology, Vol. XXII, April, 1931, pp. 268-278; 
Shulson, Violet and Crawford, C. C.: “Experimental Comparison of True-False 
and Completion Tests.” The Journal of Educational Psychology, Vol. XIX, 
November, 1928, pp. 580-583; Krueger, W. C. F.: ‘“‘An Experimental Study of 
Certain Phases of a True-false Test.” The Journal of Educational Psychology, 
Vol. XXIII, February, 1932, pp. 81-91. 

* For an example of one who, as an exception, senses the necessity of relating 
testing to progressive objectives, see Tyler, R. W.: ‘Formulating Objectives for 
Tests.” Educational Research Bulletin, Vol. XII, October 11, 1933, pp. 197-206. 

* Odell, C. W.: Educational Measurement in High School. Pp. 474, 482. 
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objectives and teaching procedure and between teaching procedure 
and the examination. 

There could be no objections to considering the new-type examina- 
tion as the end of education if this end coincided with an acceptable 
general objective. On one occasion, when the writer charged that 
this examination was becoming the end of education, he received the 
reply that if such is true the fault lies in the teacher, not in the objec- 
tive test. But, as we have shown, a method of measuring will and 
should become the directing aim for the teaching process. This 
means that one should not object to consideration of the new-type 
examination as the end of education unless one is willing in doing so 
to criticize adversely the nature of this test. In other words, we do 
not need to defend a suitable means of measurement against the 
charge that it is becoming the objective of education. 

The previous discussion indicates the necessity of having a unified, 
harmonious educational program, with all of its parts related to a 
general objective. There is a tendency to forget a comprehensive 
aim in both teaching and testing. This is due largely to the ease of 
teaching and testing for immediate, specific, and somewhat tangible 
ends in comparison with the difficulty of teaching and testing for the 
more remote, general, and intangible ones. Accordingly, in this 
consistent educational program, we advocate a common general 
objective as a basis for a teaching procedure consisting of individual 
and group problems or projects, directed study, a socialized recitation, 
incidental observation of pupils while studying, composing, or con- 
structing, and oral and written discussions within and without the 
recitation period. Such is the ideal, integrated classroom program 
required for the development of intelligence and social attitudes in the 
individual. 

The researcher is challenged to share in the progressive develop- 
ment of such an educational program. To do so requires refraining 
from isolated and relatively useless experimentation, as, for example, 
studies to discover the most suitable grade for teaching Ivanhoe, what 
people read in the school paper, and whether pupils should add from 
the bottom or the top of the column. Instead, the educational 
scientist could study more profitably a school as a whole over a period 
of time, a school directed by a general objective that reflects the 
demands of the social situation. We need more experimentation 
such as Collings’ famous study of the value of the project method of 
teaching, but extending over a longer period of time and with the 
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factors more fully controlled. What the researcher should appreciate 
is that only as we study the parts of an educational program in connec- 
tion with the whole, that is, in relation to other parts, can we finally 
discover its value. He should consider growth of the whole individual 
in the whole environment of the school. 

At present there is special need of much research toward the 
obtainment of the intangible but more important ends of education. 
Research has erred in following the line of least resistance as it has 
furnished education with an over supply of discoveries in the teaching 
of mechanical skills and isolated facts. There is a noticeable lack of 
experimentation dealing with the development of attitudes,' ideals, 
initiative, creativeness, insight, and reasoning. More specifically, one 
outstanding gap in research today consists of the lack of instruments 
of measurement for checking on such major outcomes both in research 
and in instruction. Similarly, for a fuller appreciation of the sig- 
nificance of the early environment in shaping personality, education 
could profit by many more thorough case studies of babies and of 
children, in favorable and unfavorable homes.? Research is at its best 
in the consideration of the whole child in a continuous life-like setting. 

There are those who apparently excuse the short comings of 
educational science on the ground that it is a young science. They 
explain that we can expect this young science to grow in the next 
seventy-five or one hundred years somewhat as physics and chemistry 
have grown in the last century. Obviously one is safe in making 
such a prediction to this generation without future injury to one’s 
reputation as a prophet. Regardless of what the future may reveal, 
we should realize that just as babyhood and childhood are the most 
important periods for direction and correction of the individual so 
the very youthfulness of the science of education indicates its special 
need of guidance and criticism. We have rejected the doctrine of 
natural unfoldment in the development of the child, and we must do 





1 As an illustration of a beginning effort to measure in this field, see Peterson, 
Ruth C. and Thurstone, L. L.: ‘‘The Effect of a Motion Picture Film on Children’s 
Attitudes Toward Germans.” The Journal of Edwcational Psychology, Vol. 
XXIII, April, 1932, pp. 241-246. 

2 A step in the right direction may be seen in two similar studies: Burks, Bar- 
bara: “‘The Relative Influence of Nature and Nurture upon Mental Develop- 
ment.” The Twenty-Seventh Yearbook, Part I, Chapters X, XI, 1928; Freeman, 
F. N.: “The Influence of Environment on Intellgence, School Achievement and 


Conduct of Foster Children.’ T'wenty-Seventh Yearbook, Part I, Chapter IX, 
1928, 
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the same in the growth of educational research. The common 
tendency of the child to reject restraint and to follow his own inclina- 
tions seemingly is repeated in the science of education today. Sim- 
ilarly, educational research exemplifies the tendency of the child to 
prize the new and the novel, regardless of its worth. Thus the 
youthfulness of educational science furnishes further evidence for 
the conclusion that its continued growth and increasing usefulness 
depend on guidance from a vital modern and comprehensive objective 
of education. 

One is justified in concluding that research has frequently supported 
traditional and reactionary practices in education. The science of 
education will takes its position solely on the side of progress in 
education when it more fully adopts and applies the principles of 
progressive education. To this end, we suggest that in performing 
each experiment, the researcher should consider the broad directive 
aim of education as carefully as he does the steps of hisexperiment. In 
addition, in the report of the experiment, this aim should be stated 
explicitly, in order that the unsuspecting reader may not be misled. 
Such persistent acceptance and use of a modern general objective 
should result in the elimination of useless and conflicting experi- 
mentation and the building of a consistent and progressive program 
of education. 
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